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(54) Linear prediction coefficient generation during frame erasure or packet loss. 

(57) A speech coding system robust to frame erasure (or packet loss) is described. Illustrative embodi- 
ments are directed to a modified version of CCITT standard G.728. In the event of frame erasure, vectors 
of an excitation signal are synthesized based on previously stored excitation signal vectors generated 
during non-erased frames. This synthesis differs for voiced and non-voiced speech. During erased 
frames, linear prediction filter coefficients are synthesized as a weighted extrapolation of a set of linear 
prediction filter coefficients determined during non-erased frames. The weighting factor is a number 
less than 1. This weighting accomplishes a bandwidth-expansion of peaks in the frequency response of 
a linear predictive filter. Computational complexity during erased frames is reduced through the 
elimination of certain computations needed during non-erased frames only. This reduction in compu- 
tational complexity offsets additional computation required for excitation signal synthesis and linear 
prediction filter coefficient generation during erased frames. 
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Field of the Invention 

The present invention relates generally to speech coding arrangements for use in wireless communication 
systems, and more particularly to the ways in which such speech coders function in the event of burst-like 
5 errors in wireless transmission. 

Background of the Invention 

Many communication systems, such as cellular telephone and personal communications systems, rely on 
w wireless channels to communicate information. In the course of communicating such information, wireless com- 
munication channels can suffer from several sources of error, such as multipath fading. These error sources 
can cause, among other things, the problem of frame erasure. An erasure refers to the total loss or substantial 
corruption of a set of bits communicated to a receiver. A frame is a predetermined fixed number of bits. 

If a frame of bits is totally lost, then the receiver has no bits to interpret. Under such circumstances, the 
15 receiver may produce a meaningless result. If a frame of received bits is corrupted and therefore unreliable, 
the receiver may produce a severely distorted result. 

As the demand for wireless system capacity has increased, a need has arisen to make the best use of 
available wireless system bandwidth. One way to enhance the efficient use of system bandwidth is to employ 
a signal compression technique. For wireless systems which carry speech signals, speech compression (or 
20 speech coding) techniques may be employed for this purpose. Such speech coding techniques include anaiy- 
sis-by-synthesis speech coders, such as the well-known code-excited linear prediction (or CELP) speech cod- 
er. 

The problem of packet loss in packet-switched networks employing speech coding arrangements is very 
similar to frame erasure in the wireless context. That is, due to packet loss, a speech decoder may either fail 

25 to receive a frame or receive a frame having a significant number of missing bits. In either case, the speech 
decoder is presented with the same essential problem — the need to synthesize speech despite the loss of 
compressed speech informatiC: t Both "frame erasure" and "packet loss" concern a communication channel 
(or network) problem which causes the loss of transmitted bits. For purposes of this description, therefore, 
the term "frame erasure" may be deemed synonymous with packet loss. 

30 CELP speech coders employ a codebook of excitation signals to encode an original speech signal. These 

excitation signals are used to "excite" a linear predictive (IPC) filter which synthesizes a speech signal (or some 
precursor to a speech signal) in response to the excitation. The synthesized speech signal is compared to the 
signal to be coded. The codebook excitation signal which most closely matches the original signal is identified. 
The identified excitation signal's codebook index is then communicated to a CELP decoder (depending upon 

35 the type of CELP system, other types of information may be communicated as well). The decoder contains a 
codebook identical to that of the CELP coder. The decoder uses the transmitted index to select an excitation 
signal from its own codebook. This selected excitation signal is used to excite the decoder's LPC filter. Thus 
excited, the LPC filter of the decoder generates a decoded (or quantized) speech signal — the same speech 
signal which was previously determined to be closest to the original speech signal. 

40 Wireless and other systems which employ speech coders may be more sensitive to the problem of frame 

erasure than those systems which do not compress speech. This sensitivity is due to the reduced redundancy 
of coded speech (compared to uncoded speech) making the possible loss of each communicated bit more sig- 
nificant. In the context of a CELP speech coders experiencing frame erasure, excitation signal codebook in- 
dices may be either lost or substantially corrupted. Because of the erased frame(s), the CELP decoder will 

45 not be able to reliably identify which entry in its codebook should be used to synthesize speech. As a result, 
speech coding system performance may degrade significantly. 

As a result of lost excitation signal codebook indicies, normal techniques for synthesizing an excitation sig- 
nal in a decoder are ineffective. These techniques must therefore be replaced by alternative measures. A fur- 
ther result of the loss of codebook indices is that the normal signals available for use in generating linear pre- 

50 diction coefficients are unavailable. Therefore, an alternative technique for generating such coefficients is 
needed. 

Summary of the Invention o 

55 The present invention generates linear prediction coefficient signals during frame erasure based on a 

weighted extrapolation of linear prediction coefficient signals generated during a non-erased frame. This 
weighted extrapolation accomplishes an expansion of the bandwidth of peaks in the frequency response of a 
linear prediction filter. 
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Illustratively, linear prediction coefficient signals generated during a non-erased frame are stored in a buf- 
fer memory. When a frame erasure occurs, the last "good" set of coefficient signals are weighted by a band- 
width expansion factor raised to an exponent. The exponent is the index identifying the coefficient of interest. 
The factor is a number in the range of 0.95 to 0.99. 

5 

Brief Description of the Drawings 

Figure 1 presents a block diagram of a G.728 decoder modified in accordance with the present invention. 
Figure 2 presents a block diagram of an illustrative excitation synthesizer of Figure 1 in accordance with 
io the present invention. 

Figure 3 presents a block-flow diagram of the synthesis mode operation of an excitation synthesis proc- 
essor of Figure 2. 

Figure 4 presents a block-flow diagram of an alternative synthesis mode operation of the excitation syn- 
thesis processor of Figure 2. 

75 Figure 5 presents a block-flow diagram of the LPC parameter bandwidth expansion performed by the band- 

width expander of Figure 1 . 

Figure 6 presents a block diagram of the signal processing performed by the synthesis fitter adapter of 
Figure 1. 

Figure 7 presents a block diagram of the signal processing performed by the vector gain adapter of Fjgure 

20 1. 

Figures 8 and 9 present a modified version of an LPC synthesis filter adapter and vector gain adapter, 
respectively, for G.728. 

Figures 1 0 and 11 present an LPC filter frequency response and a bandwidth-expanded version of same, 
respectively. 

25 Figure 12 presents an illustrative wireless communication system in accordance with the present inven- 

tion. 

Detailed Description 
30 I. Introduction 

The present invention concerns the operation of a speech coding system experiencing frame erasure -- 
that is, the loss of a group of consecutive bits in the compressed bit-stream which group is ordinarily used to 
synthesize speech. The description which follows concerns features of the present invention applied iilustra- 

35 lively to the well-known 16 kbit/s low-delay CELP (LD-CELP) speech coding system adopted by the CCiTT as 
its international standard G.728 (for the convenience of the reader, the draft recommendation which was adopt- 
ed as the G.728.standard is attached hereto as an Appendix; the draft will be referred to herein as the "G.728 
standard draft"). This description notwithstanding, those of ordinary skill in the art will appreciate that features 
of the present invention have applicability to other speech coding systems. 

40 The G.728 standard draft includes detailed descriptions of the speech encoder and decoder of the standard 

(See G.728 standard draft, sections 3 and 4). The first illustrative embodiment concerns modifications to the 
decoder of the standard. While no modifications to the encoder are required to implement the present invention, 
the present invention may be augmented by encoder modifications. In fact, one illustrative speech coding sys- 
tem described below includes a modified encoder. 

45 Knowledge of the erasure of one or more frames is an input to the illustrative embodiment of the present 

invention. Such knowledge may be obtained in any of the conventional ways well known in the art. For example, 
frame erasures may be detected through the use of a conventional error detection code. Such a code would 
be implemented as part of a conventional radio transmission/reception subsystem of a wireless communication 
system. 

so For purposes of this description, the output signal of the decoder's LPC synthesis filter, whether in the 

speech domain or in a domain which is a precursor to the speech domain, will be referred to as the "speech 
signal." Also, for clarity of presentation, an illustrative frame will be an integral multiple of the length of an adap- 
tation cycle of the G.728 standard. This illustrative frame length is, in fact, reasonable and allows presentation 
of the invention without loss of generality. It may be assumed, for example, that a frame is 10 ms in duration 

55 or four times the length of a G.728 adaptation cycle. The adaptation cycle is 20 samples and corresponds to 
a duration of 2.5 ms. 

For clarity of explanation, the illustrative embodiment of the present invention is presented as comprising 
individual functional blocks. The functions these blocks represent may be provided through the use of either 
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shared or dedicated hardware, including, but not limited to, hardware capable of executing software. For ex- 
ample, the blocks presented in Figures 1. 2, 6. and 7 may be provided by a single shared processor. (Use of 
the term "processor 11 should not be construed to refer exclusively to hardware capable of executing software.) 
Illustrative embodiments may comprise digital signal processor (DSP) hardware, such as the AT&T DSP1 6 
5 or DSP32C, read-only memory (ROM) for storing software performing the operations discussed below, and 
random access memory (RAM) for storing DSP results. Very large scale integration (VLSI) hardware embodi- 
ments, as well as custom VLSI circuitry in combination with a general purpose DSP circuit, may also be pro- 
vided. 

10 II. An Illustrative Embodiment 

Figure 1 presents a block diagram of a G.728 LD-GELP decoder modified in accordance with the present 
invention (Figure 1 is a modified version of figure 3 of the G.728 standard draft). In normal operation (i.e.. with- 
out experiencing frame erasure) the decoder operates in accordance with G.728. It first receives codebook 

15 indices, i, from a communication channel. Each index represents a vector of five excitation signal samples 
which may be obtained from excitation VQ codebook 29. Codebook 29 comprises gain and shape codebooks 
as described in the G.728 standard draft. Codebook 29 uses each received index to extract an excitation co- 
devector. The extracted codevector is that which was determined by the encoder to be the best match with 
the original signal. Each extracted excitation codevector is scaled by gain amplifier 31. Amplifier 31 multiplies 

20 each sample of the excitation vector by a gain determined by vector gain adapter 300 (the operation of vector 
gain adapter 300 is discussed below). Each scaled excitation vector, ET. is provided as an input to an excitation 
synthesizer 100. When no frame erasures occur, synthesizer 100 simply outputs the scaled excitation vectors 
without change. Each scaled excitation vector is then provided as input to an LPC synthesis filter 32. The LPC 
synthesis filter 32 uses LPC coefficients provided by a synthesis filter adapter 330 through switch 120 (switch 

25 120 is configured according to the "dashed" line when no frame erasure occurs; the operation of synthesis 
filter adapter 330. switch 1 20, and bandwidth expander 1 1 5 are discussed below). Filter 32 generates decoded 
(or "quantized") speech. Filter 32 is a 50th order synthesis filter capable of introducing periodicity in the de- 
coded speech signal (such periodicity enhancement generally requires a filter of order greater than 20). In ac- 
cordance with the G.728 standard, this decoded speech is then postfiltered by operation of postfilter 34 and 

30 postfilter adapter 35. Once postfiltered. the format of the decoded speech is converted to an appropriate stan- 
dard format by format converter 28. This format conversion facilitates subsequent use of the decoded speech 
by other systems. 

A. Excitation Signal Synthesis During Frame Erasure 

35 

In the presence of frame erasures, the decoder of Figure 1 does not receive reliable information (if it re- 
ceives anything at all) conceding which vector of excitation signal samples should be extracted from codebook 
29. In this case, the decode ust obtain a substitute excitation signal for use in synthesizing a speech signal. 
The generation of a substitute excitation signal during periods of frame erasure is accomplished by excitation 
40 synthesizer 100. 

Figure 2 presents a block diagram of an illustrative excitation synthesizer 1 00 in accordance with the pres- 
ent invention. During frame erasures, excitation synthesizer 100 generates one or more vectors of excitation 
signal samples based on previously determined excitation signal samples. These previously determined exci- 
tation signal samples were extracted with use of previously received codebook indices received from the com- 
45 munication channel. As shown in Figure 2, excitation synthesizer 100 includes tandem switches 110, 130 and 
excitation synthesis processor 120. Switches 110, 130 respond to a frame erasure signal to switch the mode 
of the synthesizer 100 between normal mode (no frame erasure) and synthesis mode (frame erasure). The 
frame erasure signal is a binary flag which indicates whether the current frame is normal (e.g., a value of "0") 
or erased (e.g., a value of "1"). This binary flag is refreshed for each frame. 

50 

1. Normal Mode 

In normal mode (shown by the dashed lines in switches 110 and 130). synthesizer 100 receives gain-scaled 
excitation vectors, ET (each of which comprises five excitation sample values), and passes those vectors to 
55 its output. Vector sample values are also passed to excitation synthesis processor 120 Processor 120 stores 
these sample values in a buffer. ETPAST, for subsequent use in the event of frame erasure. ETPAST holds 
200 of the most recent excitation signal sample values (i.e.. 40 vectors) to provide a history of recently received 
(or synthesized) excitation signal values. When ETPAST is full, each successive vector of five snmples pushed 
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into the buffer causes the oldest vector of five samples to fall out of the buffer. (As will be discussed below 
with reference to the synthesis mode, the history of vectors may include those vectors generated in the event 
of frame erasure.) 

5 2. Synthesis Mode 

In synthesis mode (shown by the solid lines in switches 110 and 130), synthesizer 100 decouples the gain- 
scaled excitation vector input and couples the excitation synthesis processor 120 to the synthesizer output. 
Processor 120, in response to the frame erasure signal, operates to synthesize excitation signal vectors. 

w Figure 3 presents a block-flow diagram of the operation of processor 120 in synthesis mode. At the outset 

of processing, processor 120 determines whether erased frame(s) are likely to have contained voiced speech 
(see step 1201). This may be done by conventional voiced speech detection on past speech samples. In the 
context of the G.728 decoder, a signal PTAP is available (from the postfilter) which may be used in a voiced 
speech decision process. PTAP represents the optimal weight of a single-tap pitch predictor for the decoded 

15 speech. If PTAP is large (e.g.. close to 1). then the erased speech is likely to have been voiced. If PTAP is 
small (e.g., close to 0). then the erased speech is likely to have been non- voiced (i.e.. unvoiced speech, si- 
lence, noise). An empirically determined threshold, VTH, is used to make a decision between voiced and non- 
voiced speech. This threshold is equal to 0.6/1.4 (where 0.6 is a voicing threshold used by the G.728 postfilter 
and 1 A is an experimentally determined number which reduces the threshold so as to err on the side on voiced 

20 speech). 

If the erased f rame(s) is determined to have contained voiced speech, a new gain-scaled excitation vector 
ET is synthesized by locating a vector of samples within buffer ETPAST. the earliest of which is KP samples 
in the past (see step 1204). KP is a sample count corresponding to one pitch-period of voiced speech. KP may 
be determined conventionally from decoded speech; however, the postfilter of the G.728 decoder has this val- 

25 ue already computed. Thus, the synthesis of a new vector. ET. comprises an extrapolation (e.g.. copying) of 
a set of. 5 consecutive samples into the present. Buffer ETPAST is updated to reflect the latest synthesized 
vector of sample values. ET (see step 1206). This process is repeated until a good (non-erased) frame is re- 
ceived (see steps 1208 and 1209). The process of steps 1204. 1206. 1208 and 1209 amount to a periodic rep- 
etition of the last KP samples of ETPAST and produce a periodic sequence of ET vectors in the erased f rame(s) 

30 (where KP is the period). When a good (non-erased) frame is received, the process ends. 

If the erased f rame(s) is determined to have contained non-voiced speech (by step 1 201 ), then a different 
synthesis procedure is implemented. An illustrative synthesis of ET vectors is based on a randomized extrap- 
olation of groups of five samples in ETPAST. This randomized extrapolation procedure begins with the com- 
putation of an average magnitude of the most recent 40 samples of ETPAST (see step 1210). This average 

35 magnitude is designated as AVMAG. AVMAG is used in a process which insures that extrapolated ET vector 
samples have the same average magnitude as the most recent 40 samples of ETPAST. 

A random integer number, NUMR. is generated to introduce a measure of randomness into the excitation 
synthesis process. This randomness is important because the erased frame contained unvoiced speech (as 
determined by step 1201). NUMR may take on any integer value between 5 and 40. inclusive (see step 121 2). 

40 Five consecutive samples of ETPAST are then selected, the oldest of which is NUMR samples in the past (see 
step 1214). The average magnitude of these selected samples is then computed (see step 1216). This average 
magnitude is termed VECAV. A scale factor, SF. is computed as the ratio of AVMAG to VECAV (see step 1218). 
Each sample selected from ETPAST is then multiplied by SF. The scaled samples are then used as the syn- 
thesized samples of ET (see step 1220). These synthesized samples are also used to update ETPAST as de- 

45 scribed above (see step 1222). 

If more synthesized samples are needed to fill an erased frame (see step 1224), steps 1212-1222 are re- 
peated until the erased frame has been filled. If a consecutive subsequent frame(s) is also erased (see step 
1226), steps 1210-1224 are repeated to fill the subsequent erased frame(s). When all consecutive erased 
frames are filled with synthesized ET vectors, the process ends. 

50 

3. Alternative Synthesis Mode for Non-voiced Speech 

Figure 4 presents a block-flow diagram of an alternative operation of processor 1 20 in excitation synthesis 
mode. In this alternative, processing for voiced speech is identical to that described above with reference to 
55 Figure 3. The difference between alternatives is found in the synthesis of ET vectors for non-voiced speech. 
Because of this, only that processing associated with non-voiced speech is presented in Figure 4. 

As shown in the Figure, synthesis of ET vectors for non-voiced speech begins with the computation of cor- 
relations between the most recent block of 30 samples stored in buffer ETPAST and every other block of 30 
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samples of ETPAST which lags the most recent block by between 31 and 170 samples (see step 1230). For 
example, the most recent 30 samples of ETPAST is first correlated with a block of samples between ETPAST 
samples 32-61 inclusive. Next, the most recent block of 30 samples is correlated with samples of ETPAST 
between 33-62. inclusive, and so on. The process continues for all blocks of 30 samples up to the block con- 
taining samples between 171-200. inclusive 

For all computed correlation values greater than a threshold value. THC. a time lag (MAXI) correspond.ng 
to the maximum correlation is determined (see step 1232). 

Next tests are made to determine whether the erased frame likely exhibited very low periodicity. Under 
circumstances of such low periodicity, it is advantageous to avoid the introduction of artificial periodicity into 
the ET vector synthesis process. This is accomplished by varying the value of time lag MAXI. If either (/) PTAP 
is less than a threshold. VTH1 (see step 1234), or (ii) the maximum correlation corresponding to MAXI is less 
than a constant. MAXC (see step 1236), then very low periodicity is found. As a result, MAXI is incremented 
by 1 (see step 1238). If neither of conditions (0 and (if) are satisfied. MAXI is not incremented. Illustrative values 
for VTH1 and MAXC are 0.3 and 3*1 0 7 , respectively. 

MAXI is then used as an index to extract a vector of samples from ETPAST. The earliest of the extracted 
samples are MAXI samples in the past. These extracted samples serve as the next ET vector (see step 1 240). 
As before, buffer ETPAST is updated with the newest ET vector samples (see step 1242). 

If additional samples are needed to fill the erased frame (see step 1244), then steps 1234-1242 are re- 
peated. After all samples in the erased frame have been filled, samples in each subsequent erased frame are 
filled (see step 1246) by repeating steps 1230-1244. When all consecutive erased frames are filled with syn- 
thesized ET vectors, the process ends. 

B. LPC Filter Coefficients for Erased Frames 

In addition to the synthesis of gain-scaled excitation vectors. ET. LPC filter coefficients must be generated 
during erased frames. In accordance with the present invention, LPC filter coefficients for erased frames are 
generated through a bandwidth expansion procedure. This bandwidth expansion procedure helps account for 
uncertainty in the LPC filter frequency response in erased frames. Bandwidth expansion softens the sharp- 
ness of peaks in the LPC filter frequency response. 

Figure 10 presents an illustrative LPC filter frequency response based on LPC coefficients determined 
for a non-erased frame. As can be seen, the response contains certain "peaks." It is the proper location of these 
peaks during frame erasure which is a matter of some uncertainty. For example, correct frequency response 
for a consecutive frame might look like that response of Figure 10 with the peaks shifted to the right or to the 
left. During frame erasure, since decoded speech is not available to determine LPC coefficients, these coef- 
35 f icients (and hence the filter frequency response) must be estimated. Such an estimation may be accomplish- 
ed through bandwidth expansion. The result of an illustrative bandwidth expansion is shown in Figure 11. As 
may be seen from Figure 11, the peaks of the frequency response are attenuated resulting in an expanded 
3db bandwidth of the peaks. Such attenuation helps account for shifts in a "correct" frequency response which 
cannot be determined because of frame erasure. 

According to the G.728 standard, LPC coefficients are updated at the third vector of each four-vector adap- 
tation cycle. The presence of erased frames need not disturb this timing. As with conventional G.728. new LPC 
coefficients are computed at the third vector ET during a frame. In this case, however, the ET vectors are syn- 
thesized during an erased frame. 

As shown in Figure 1, the embodiment includes a switch 120, a buffer 110, and a bandwidth expander 115. 
45 During normal operation switch 120 is in the position indicated by the dashed line. This means that the LPC 
coefficients, a M are provided to the LPC synthesis filter by the synthesis filter adapter 33. Each set of newly 
adapted coefficients. a», is stored in buffer 110 (each new set overwriting the previously saved set of coeffi- 
cients). Advantageously, bandwidth expander 115 need not operate in normal mode (if it does, its output goes 
unused since switch 120 is in the dashed position). 
50 Upon the occurrence of a frame erasure, switch 120 changes state (as shown in the solid line position). 

Buffer 1 1 0 contains the last set of LPC coefficients as computed with speech signal samples from the last good 
frame. At the third vector of the erased frame, the bandwidth expander 115 computes new coefficients. a ( . 

Figure 5 is a block-flow diagram of the processing performed by the bandwidth expander 11 5 to generate 
new LPC coefficients. As shown in the Figure, expander 115 extracts the previously saved LPC coefficients 
55 from buffer 110 (see step 1151). New coefficients a t are generated in accordance with expression (1): 

a = (BEF^a,. 1 :- i;-.50, (1) 
where BEF is a bandwidth expansion factor illustratively takes on a value in the range 0.95-0.99 and is advan- 
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tageously set to 0.97 or 0.98 (see step 1153). These newly, computed coefficients are then output (see step 
1155). Note that coefficients a, are computed only once for each erased frame. 

The newly computed coefficients are used by the LPC synthesis filter 32 for the entire erased frame. The 
LPC synthesis filter uses the new coefficients as though they were computed under normal circumstances by 
adapter 33. The newly computed LPC coefficients are also stored in buffer 110. as shown in Figure 1. Should 
there be consecutive frame erasures, the newly computed LPC coefficients stored in the buffer 110 would be 
used as the basis for another iteration of bandwidth expansion according to the process presented in Figure 
5. Thus, the greater the number of consecutive erased frames, the greater the applied bandwidth expansion 
(i.e., for the kth erased frame of a sequence of erased frames, the effective bandwidth expansion factor is 
BEF k ). 

Other techniques for generating LPC coefficients during erased frames could be employed instead of the 
bandwidth expansion technique described above. These include (i) the repeated use of the last set of LPC coef- 
ficients from the last good frame and (if) use of the synthesized excitation signal in the conventional G.728 
LPC adapter 33. 

C. Operation of Backward Adapters During Frame Erased Frames 



The decoder of the G.728 standard includes a synthesis filter adapter and a vector gain adapter (blocks 
33 and 30. respectively, of figure 3. as well as figures 5 and 6, respectively, of the G.728 standard draft). Under 

20 normal operation (i.e.. operation in the absence of frame erasure), these adapters dynamically vary certain 
parameter values based on signals present in the decoder. The decoder of the illustrative embodiment also 
includes a synthesis filter adapter 330 and a vector gain adapter 300. When no frame erasure occurs, the syn- 
thesis filter adapter 330 and the vector gain adapter 300 operate in accordance with the G.728 standard. The 
operation of adapters 330. 300 differ from the corresponding adapters 33. 30 of G.728 only during erased 

25 frames. 

As discussed above, neither the update to LPC coefficients by adapter 330 nor the update to gain predictor 
parameters by adapter 300 is needed during the occurrence of erased frames. In the case of the LPC coeffi- 
cients, this is because such coefficients are generated through a bandwidth expansion procedure. !n the case 
of the gain predictor parameters, this is because excitation synthesis is performed in the gain-scaled domain. 
30 Because the outputs of blocks 330 and 300 are not needed during erased frames, signal processing operations 
performed by these blocks 330. 300 may be modified to reduce computational complexity. 

As may be seen in Figures 6 and 7, respectively, the adapters 330 and 300 each include several signal 
processing steps indicated by blocks (blocks 49-51 in figure 6; blocks 39-48 and 67 in figure 7). These blocks 
are generally the same as those defined by the G.728 standard draft. In the first good frame following one or 
more erased frames, both blocks 330 and 300 form output signals based on signals they stored in memory 
during an erased frame. Prior to storage, these signals were generated by the adapters based on an excitation 
signal synthesized during an erased frame. In the case of the synthesis filter adapter 330, the excitation signal 
is first synthesized into quantized speech prior to use by the adapter. In the case of vector gain adapter 300. 
the excitation signal is used directly. In either case, both adapters need to generate signals during an erased 
40 frame so that when the next good frame occurs, adapter output may be determined. 

Advantageously, a reduced number of signal processing operations normally performed by the adapters 
of Figures 6 and 7 may be performed during erased frames. The operations which are performed are those 
which are either (/} needed for the formation and storage of signals used in forming adapter output in a sub- 
sequent good (i.e., non-erased) frame or (if) needed for the formation of signals used by other signal processing 
blocks of the decoder during erased frames. No additional signal processing operations are necessary. Blocks 
330 and 300 perform a reduced number of signal processing operations responsive to the receipt of the frame 
erasure signal, as shown in Figure 1. 6, and 7. The frame erasure signal either prompts modified processing 
or causes the module not to operate. 

Note that a reduction in the number of signal processing operations in response to a frame erasure is not 
required for proper operation; blocks 330 and 300 could operate normally, as though no frame erasure has 
occurred, with their output signals being ignored, as discussed above. Under normal conditions, operations (i) 
and (it) are performed. Reduced signal processing operations, however, allow the overall complexity of the de- 
coder to remain within the level of complexity established for a G.728 decoder under normal operation. Without 
reducing operations, the additional operations required to synthesize an excitation signal and bandwidth-ex- 
pand LPC coefficients would raise the overall complexity of the decoder. 

In the case of the synthesis filter adapter 330 presented in Figure 6, and with reference to the pseudo- 
code presented in the discussion of the "HYBRID WINDOWING MODULE" at pages 28-29 of the G.728 stan- 
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dard draft, an illustrative reduced set of operations comprises (/) updating buffer memory SB using the syn- 
thesized speech (which is obtained by passing extrapolated ET vectors through a bandwidth expanded version 
of the last good LPC filter) and (if) computing REXP in the specified manner using the updated SB buffer. 
In addition, because the G.728 embodiment use a postfilter which employs 10th-order LPC coefficients 

5 and the first reflection coefficient during erased frames, the illustrative set of reduced operations further com- 
prises (/77) the generation of signal values RTMP(1 ) through RTMP(11 ) (RTMP(1 2) through RTMP(51 ) not need- 
ed) and, </V) with reference to the pseudo-code presented in the discussion of the "LEVINSON-DURBIN RE- 
CURSION MODULE" at pages 29-30 of the G.728 standard draft. Levinson-Durbin recursion is performed from 
order 1 to order 10 (with the recursion from order 11 through order 50 not needed). Note that bandwidth ex- 

w pansion is not performed. 

In the case of vector gain adapter 300 presented in Figure 7. an illustrative reduced set of operations com- 
prises (/) the operations of blocks 67. 39. 40, 41 , and 42, which together compute the offset-removed logarith- 
mic gain (based on synthesized ET vectors) and GTMP. the input to block 43; (ii) with reference to the pseudo- 
code presented in the discussion of the "HYBRID WINDOWING MODULE" at pages 32-33, the operations of 

15 updating buffer memory SBLG with GTMP and updating REXPLG, the recursive component of the autocor- 
relation function; and {Hi) with reference to the pseudo-code presented in the discussion of the "LOG-GAIN 
LINEAR PREDICTOR" at page 34, the operation of updating filter memory G STATE with GTMP. Note that the 
functions of modules 44. 45, 47 and 48 are not performed. 

As a result of performing the reduced set of operations during erased frames (rather than all operations). 

20 the decoder can properly prepare for the next good frame and provide any needed signals during erased frames 
while reducing the computational complexity of the decoder. - 

D. Encoder Modification 

25 As stated above, the present invention does not require any modification to the encoder of the G.728 stan- 

dard. However, such modifications may be advantageous under certain circumstances. For example, if a frame 
erasure occurs at the beginning of a talk spurt (e.g., at the onset of voiced speech from silence), then a syn- 
thesized speech signal obtained from an extrapolated excitation signal is generally not a good approximation 
of the original speech. Moreover, upon the occurrence of the next good frame there is likely to be a significant 

30 mismatch between the internal states of the decoder and those of the encoder. This mismatch of encoder and 
decoder states may take some time to converge. 

One way to address this circumstance is to modify the adapters of the encoder (in addition to the above- 
described modifications to those of the G.728 decoder) so as to improve convergence speed. Both the LPC 
filter coefficient adapter and the gain adapter (predictor) of the encoder may be modified by introducing a spec- 

35 tral smoothing technique (SST) and increasing the amount of bandwidth expansion. 

Figure 8 presents a modified version of the LPC synthesis filter adapter of figure 5 of the G.728 standard 
draft for use in the encoder. The modified synthesis filter adapter 230 includes hybrid windowing module 49, 
which generates autocorrelation coefficients; SST module 495, which performs a spectral smoothing of auto- 
correlation coefficients from windowing module 49; Levinson-Durbin recursion module 50, for generating syn- 

40 thesis filter coefficients; and bandwidth expansion module 510, for expanding the bandwidth of the spectral 
peaks of the LPC spectrum. The SST module 495 performs spectral smoothing of autocorrelation coefficients 
by multiplying the buffer of autocorrelation coefficients, RTMP(1 ) - RTMP (51 ), with the right half of a Gaussian 
window having a standard deviation of 60Hz. This windowed set of autocorrelation coefficients is then applied 
to the Levinson-Durbin recursion module 50 in the normal fashion. Bandwidth expansion module 51 0 operates 

45 on the synthesis filter coefficients like module 51 of the G.728 of the standard draft, but uses a bandwidth 
expansion factor of 0.96, rather than 0.988. 

Figure 9 presents a modified version of the vector gain adapter of figure 6 of the G.728 standard draft for 
use in the encoder. The adapter 200 includes a hybrid windowing module 43, an SST module 435, a Levinson- 
Durbin recursion module 44, and a bandwidth expansion module 450. All blocks in Figure 9 are identical to 

50 those of figure 6 of the G.728 standard except for new blocks 435 and 450. Overall, modules 43, 435, 44, and 
450 are arranged like the modules of Figure 8 referenced above. Like SST module 495 of Figure 8, SST module 
435 of Figure 9 performs a spectral smoothing of autocorrelation coefficients by multiplying the buffer of au- 
tocorrelation coefficients, R(1) - R(11), with the right half of a Gaussian window This time, however, the Gaus- 
sian window has a standard deviation of 45Hz. Bandwidth expansion module 450 of Figure 9 operates on the 

55 synthesis fitter coefficients like the bandwidth expansion module 51 of figure 6 of the G.728 standard draft, 
but uses a bandwidth expansion factor of 0.87, rather than 0.906. 
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E. An Illustrative Wireless System c * 

As stated above, the present invention has application to wireless speech communication systems. Figure 
12 presents an illustrative wireless communication system employing an embodiment of the present invention. 

5 Figure 12 includes a transmitter 600 and a receiver 700. An illustrative embodiment of the transmitter 600 is 
a wireless base station. An illustrative embodiment of the receiver 700 is a mobile user terminal, such as a 
cellular or wireless telephone, or other personal communications system device. (Naturally, a wireless base 
station and user terminal may also include receiver and transmitter circuitry, respectively.) The transmitter 600 
includes a speech coder 610, which may be. for example, a coder according to CCITT standard G.728. The 

w transmitter further includes a conventional channel coder 620 to provide error detection (or detection and cor- 
rection) capability; a conventional modulator 630; and conventional radio transmission circuitry: all well known 
in the art. Radio signals transmitted by transmitter 600 are received by receiver 700 through a transmission 
channel. Due to. for example, possible destructive interference of various multipath components of the trans- 
mitted signal, receiver 700 may be in a deep fade preventing the clear reception of transmitted bits. Under 

75 such circumstances, frame erasure may occur. 

Receiver 700 includes conventional radio receiver circuitry 710, conventional demodulator 720.- channel 
decoder 730. and a speech decoder 740 in accordance with the present invention. Note that the channel de- 
coder generates a frame erasure signal whenever the channel decoder determines the presence of a substan- 
tial number of bit errors (or unreceived bits). Alternatively (or in addition to a frame erasure signal from the 

20 channel decoder), demodulator 720 may provide a frame erasure signal to the decoder 740. 

F. Discussion 

Although specific embodiments of this invention have been shown and described herein, it is to be under- 
25 stood that these embodiments are merely illustrative of the many possible specific arrangements which can 
be devised in application of the principles of the invention. Numerous and varied other arrangements can be 
devised in accordance with these principles by those of ordinary skill in the art without departing from the spirit 
and scope of the invention. 

For example, while the present invention has been described in the context of the G.728 LD-CELP speech 
30 coding system, features of the invention may be applied to other speech coding systems as well. For example, 
such coding systems may include a long-term predictor ( or long-term synthesis filter) for converting a gain- 
scaled excitation signal to a signal having pitch periodicity. Or, such a coding system may not include a post- 
filter. 

In addition, the illustrative embodiment of the present invention is presented as synthesizing excitation 
35 signal samples based on a previously stored gain-scaled excitation signal samples. However, the present in- 
vention may be implemented to synthesize excitation signal samples prior to gain-scaling (i.e.. prior to opera- 
tion of gain amplifier 31). Under such circumstances, gain values-must also be synthesized (e.g., extrapolated). 

In the discussion above concerning the synthesis of an excitation signal during erased frames, synthesis 
was accomplished illustratively through an extrapolation procedure. It will be apparent to those of skill in the 
■to art that other synthesis techniques, such as interpolation, could be employed. 

As used herein, the term "filter refers to conventional structures for signal synthesis, as well as other proc- 
esses accomplishing a filter-like synthesis function, such other processes include the manipulation of Fourier 
transform coefficients a filter-like result (with or without the removal of perceptually irrelevant information). 

45 
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APPENDIX 



Draft Recommendation G.T28 



Coding of Speech at 16 kbit/s 
Using 

Low-Delav Code Excited Linear Prediction (LD-CELP) 



L. INTRODUCTION 

This recommendation contains the desenption of an algorithm for the coding of speech signals 
at 16 kbit/s using Low-Delay Code Excited Linear Prediction (LD-CELP). This recommendation 
is organized as follows. 

In Section 2 a brief outline of the LD-CELP algorithm is given. In Sections 3 and 4. the LD- 
CELP encoder and LD-CELP decoder principles are discussed, respectively. In Section 5. the 
computational details pertaining to each functional aigonthnuc block are defined. Annexes A, B. 
C and D contain tables of constants used by the LD-CELP algorithm. In Annex E the sequencing 
of variable adaptation and use is given. Finally, in Appendix I information is given on procedures 
applicable to the implementation verification of the algorithm. 

Under further study is the future incorporation of three additional appendices (to be published 
separately) consisting of LD-CELP network aspects, LD-CELP fixed -point implementation 
desenption. and LD-CELP fixed-point verification procedures. 

OUTLINE OF LD-CELP 

The LD-CELP algorithm consists of an encoder and a decoder described in Sections 2.1 and 
2.2 respectively, and illustrated in Figure 1/G.728. 

The essence of CELP techniques, which is an analysis-by-synthesis approach to codebook 
search, is retained In LD-CELP. The LD-CELP however, uses backward adaptation of predictors 
and gam to achieve an algorithmic delay of 0.623 ms. Only the index to the excitation codebook 
is transmitted. The predictor coefficients are updated through LPC analysis of previously 
quantized speech. The excitation gain is updated by using the gain information embedded in the 
previously quantized excitation. The block size for the excitation vector and gain adaptation is 5 
samples only. A perceptual weighting filter is updated using LPC analysis of the unquantized 
speech. 

2.1 LD-CELP Encoder 

After the conversion from A-law or u-iaw PCM to uniform PCM, the input signal is 
partitioned into blocks of 5 consecutive input signal samples. For each input block, the encoder 
passes each of 1024 candidate codebook vectors (stored in an excitation codebook) through a gain 
scaling unit and a synthesis filter. From the resulting 1024 candidate quantized signal vectors, the 
encoder identifies the one that minimizes a frequency-weighted mean-squared error measure with 
respect to the input signal vector. The 10-bit codebook index of the corresponding best codebook 
vector (or "codevector") which gives rise to that best candidate quantized signal vector is 
transmitted co the decoder. The best codevector is then passed through the gain scaling unit and 
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t.c s>t. jiesis alter to estabiisn Lie correct filter memory tn preparation for the encoding of the next 
signal vector. The synthesis fJter coefficients and the gun are updated periodically in a backward 
adapove manner sased on the previously quantized signal and gain-scaled excitation. 

:.: LD-CELP Decoder 

The decoding operation is also performed on a block-by-block basis. Upon receiving each 
IQ-bit index, the decoder performs a table look-up to extract the corresponding code vector from 
the excitation codebook. The extracted codevector is then passed through a gain scaling unit and 
a synthesis filter to produce the current decoded signal vector. The synthesis filter coefficients and 
the gain are then updated in the same way as in the encoder. The decoded signai vector is then 
passed through an adaptive postfilter to enhance the perceptual quality. The postfilter coefficients 
are updated periodically using the information available at the decoder. The 5 samples of the 
postfilter signai vector are next converted to 5 A-law or u-law PCM output samples. 

3. LD-CELP ENCODER PRINCIPLES 

Figure 2/G.728 is a detailed block schematic of the LD-CELP encoder. The encoder in Figure 
2/G.728 is mathematically equivalent to the encoder previously shown in Figure 1/G.728 but is 
computationally more efficient to implement. 

In the following description. 

a. For each variable to be described, k is the sampling index and samples are taken at 125 us 
intervals. 

b. A group of 5 consecuDve samples in a given signal is called a vector of that signal. For 
example. 5 consecutive speech samples form a speech vector. 5 excitation samples form an 
excitation vector, and so on. 

c. We use n to denote the vector index, which is different from the sample index k. 

d. Four consecutive vectors build one adaptation cycle. In a later section, we also refer to 
adaptation cycles as frames. The two terms are used interchangably. 

The excitation Vector Quantization (VQ) codebook index is the only information explicitly 
transmitted from the encoder to the decoder. Three other types of parameters will be periodically 
updated: the excitation gain, the synthesis filter coefficients, and the y r rjymai weighting filter 
coefficients. These parameters are derived in a backward adaptive manner from signals that occur 
prior to the current signal vector The excitation gain is updated once per vector, while the 
synthesis filter coefficients and the paigf nal weighting filter coefficients art updated ooce every 
4 vectors (i.e.. a 20-sample, or 2 J ms update period). Noce that, although the processing sequence 
in the algorithm has an xlapcation cycle of 4 vectors (20 samples), the basic buffer size is still 
only I vector (5 samples). This small buffer size makes it possible to achieve a or^-way delay 
less than 2 ms. 

A description of each block of the encoder is given below. Since the LD-CELP coder is 
mainly used for encoding speech, for convenience of description, in the following we will assume 
that the input signai is speech, although in practice it can be other non-speech signals as well. 
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3 . I Input PCM Format Conversion 

This block converts the :nput A-law oru-iaw PCM signal s 0 {k) to a uniform PCM signal s m {k). 

3.1 .1 interna! Linear PCM Levels 

In convening from A-law or u-law to linear PCM. different internal representations are 
possible, depending on the device. For example, standard tables for Maw PCM define a linear 
range of -4015.5 to *4015J. The corresponding range for A-law PCM is -2016 to *2016. Both 
tables list some output values having a fractional part of 0.5. These fractional parts cannot be 
represented in an integer device unless the entire table is multiplied by 2 to make ail of the values 
integers. In fact, this is what is most commonly done in fixed point Digital Signal Processing 
(DSP) chips. On the other hand, floating point DSP chips can represent the same values listed in 
the tables. Throughout this document it is assumed that the input signal has a maximum range of 
-4095 to «~4095. This encompasses both the n-law and A-law cases. In the case of A-law it implies 
that when the linear conversion results in a range of -2016 to +2016. those values should be scaled 
up by a factor of 2 before continuing to encode the signal In the case of *i-iaw input to a fixed 
point processor where the input range is convened to -8031 to ^8031. it implies that values should 
be scaled down by a factor of 2 before beginning the encoding process. Alternatively, these 
values can be treated as being in Ql format meaning there is I bit to the right of the decimal 
point. All computation involving the data would then need to take this bit into account. 

For the case of 16-bit linear PCM input signals having the full dynamic range of -32768 to 
+32767, the input values should be considered to be in Q3 format. This means that the input 
values should be scaled down (divided) by a factor of 8. On output at the decoder the factor of 8 
would oe restored for these signals. 

32 Vector Buffer 

This block buffers 5 consecutive speech samples 1.(5/1). *,(5/n-l) s m {Sn+4) to form a 5- 

dimensional speech vector j (/i) = [j.(5«).j.(5/i * I). **■ . j„(5* +-*)]. 

3 J Adapter for Perceptual Weighting Filler 

Figure 4/G.728 shows the detailed operation of the perceptual weighting filter adapter (block 3 
in Figure 2/G.728). This adapter calculates the coefficients of the perceptual weighting filter once 
every 4 s peec h vectors based on linear prediction analysis (often referred to as LPC analysis) of 
unquantized speech. The coefficient updates occur at the third speech vector of every 4-vector 
adaptation cycle. The coefficients ire held constant in between updates. 

Refer to Figure 4<aVG.728. The calculation is performed as follows. Fim, the input 
(unquantized) s r wc h vector is passed through a hybrid windowing module (block 36) which 
places a window on previous speech vectors and calculate the first 1 1 autocorrelation coefficients 
of the windowed speech signal as the output. The Levinson-Duibin recursion module (Nock 37) 
then converts these autocorrelation coefficients to predictor coefficients. Based on these predictor 
coefficients, the weighting filter coefficient calculator (block 38) derives the desired coefficients of 
the weighting filter. These three blocks are discussed in more detail below. 
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First, let us describe the principles of hybrid windowing. Since this hybrid windowing 
technique will be used in three different kinds of LPC analyses, we first give a more general 
5 description of the technique and then specialize it to different cases. Suppose the LPC analysis is 
to be performed once every L signal samples. To be general, assume that the signal samples 

corresponding to the current LD-CELP adaptation cycle are *,(/n + l)» j„(/n+2) 

j H (/n+L-l). Then, for backward-adaptive LPC analysis, the hybrid window is applied to all 
w previous signal samples with a sample index less than m (as shown in Figure 4(b)/G.728). Let 
there be S non-recursive samples in the hybrid window function. Then, the signal samples 
s u (m-\) % s M (m-2). s m (m-tf) are all weighted by the non-recursive portion of the window. 
Starting with s m (m-N-i), all signal samples to the left of (and including) this sample are weighted 

by the recursive portion of the window, which has values b, ba. bec 1 where 0 <b < 1 and 

0 < a < 1. 

At time /*, the hybrid window function w m (t) is defined as 

/.<*) = *a^-^- l >l. ifkZm-#-l 

g m (k) = sw[c(k^m)] % dm-tfZ*Zm~l . (la) 
0 , if*2*i 



75 



20 



25 



30 



40 



45 



and the window-weighted signal is 

J.(*)/.(*) = 5.(*»a^-^- I>1 . if Jt^^V-l 



= *.(*)*.(*) = 



s m (k)g m (k) = -s m (k)sii\[c(k-m)). if m^f<*Zm-\ . (lb) 
0 , if k2m 



The samples of non-recursive portion g m (k) and the initial section of the recursive portion /.(*) for 
different hybrid windows are speciSed in Annex A. For an A/-th order LPC analysis, we need to 

calculate M+l autocorrelation coefficients X m {i) for i = 0, I, 2 hi. The j-th autocorrelation 

coefficient for the current adaptation cycle can be expressed as 

35 «-l (\ c \ 

* m U)= Z s m (k)s m {k-i)=r m (i)+ X s m (k)s m (k-i\ 

where 

r m V)= X s m (k)s m (k-i) = m £ l s m Ws a «-i)f m (k)f m (k-i). (Id) 

On the right-hand side of equation (lc). the first term r m {i) is the "recursive component" of 
R m (i). while the second term is the "noo- recursive component". The finite summation of the non- 
recursive component is calculatrri for each adaptation cycle. On the other hand, the recursive 
component is calculated recursively. The following paragraphs explain how. 

Suppose we have calculated and stored all r m Q )*s for the current adaptation cycle and want to 
go on to the next adaptation cycle, which starts at sample s m {m After the hybrid window is 
shifted to the right by L samples, the new window-weighted signal for the next adaptation cycle 
becomes 
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W<*)-M*)hw<*>- 



J,(^..t(*)«-^(*)«*(^^)l' if m+L -I . (le) 
0 if*£/n+£ 



The recursive component of R m ^U) can be written as 



m - V- 1 



(10 



or 



Therefore, r„^(/) can be calculated recursively from r m (i) using equation (lg). This newly 
calculated r m ^(iy is stored back to memory for use in the following adaptation cycle. The 
autocorrelation coefficient R m +0) is then calculated as 

*.*<0 = '—t(0* Z *-<(k)s m< (k-4) . (lh) 

So far we have described in a general manner the principles of a hybrid window calculation 
procedure. The parameter values for the hybrid windowing module 36 in Figure 4(a)/G.728 are M 



= 10. L = 20. N = 30, and a = 



y | = 0.982S2O598 (so that ar L = j). 



Once the 11 autocorrelation coefficients i = 0, 1 10 are calculated by the hybrid 

windowing procedure described above, a "white noise correction" procedure is applied. This is 
done by increasing the energy H (0) by a small amount: 



* (0) ^(^)* (0) 00 

This has the effect of filling the spectral valleys with white noise so as to reduce the spectral 
dynamic range and alleviate LU<onditioning of the subsequent Levinsoo-Durbin recursion. The 
white noise correction factor (WNCF) of 257/236 corresponds to a white noise level about 24 dB 
below the average speech power. 

Next, using the white noise corrected autocorrelation coefficients, the Levinsoo-I>irbLn 
recursion module 37 recursively computes the predictor coefficients from order 1 to order 10. Let 
the y-th coefficients of tl» f-th order predictor be af. Then, the recursive procedure can be 
specified as follows: 

£(0) = *(0) (2a) 
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10 



15 



*- fen — (2b) 

«<*">=*, (2c) 

of -«y-» + **£5». (24) 

£(i)«(l -*?)£(/ -I). (2c) 

. Equations (2b) through (2e) are evaluated recursively for / = 1. 2 10. and the final solution is 

given by 

«.--«< ,0> . lSiSlO. (20 

If we define q 0 = 1, then the 10-th order "prediction-error filter" (sometimes called "analysis 
filler") has the transfer function 

C<*> (3a) 

20 «rf - 

and the corresponding 10-th order linear predictor is defined by the following transfer function 

G<0 • (3b) 

The weighting filter coefficient calculator (block 38) calculates the perceptual weighting filter 
coefficients according to the following equations: 

■ * " . 0<r,<T,Sl. (4a) 

C(^») = -Z(<7.Ti')^". (*M 
1-1 



25 



35 



and 



The perceptual weighting filter is a 10-th order pole-zero filter defined by the transfer function 
40 W{z) in equation (4a). The values of i x and Ji 0.9 and 0.6, respectively. 

Now refer to Figure 2/G.728. The perceptual weighting filter adapter (block 3) periodically 
updates the coefficients of W(r) according to equations. (2) through (4), and feeds the coefficients 
to the impulse response vector calculator (block 12) and the perceptual weighting filters (blocks 4 
45 and 10). 

3.4 Perceptual Weighting Filter 

In Figure 2/G.728, the current input speech vector s(n) is passed through the perceptual 
weighting filter (block 4), resulting in the weighted speech vector v(u). Note that except during 
50 initialization, the filter memory (i.e.. internal state variables, or the values held in the delay units 
of the filter) should not be reset to zero at any time. On the other hand, the memory of the 
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25 



5 perceptual weighting filter.; block iO) need special handling as descnc^C later. 

2 .4.1 Son-speech Operation 

For modem signals or other non-speech signals. CCTTT cest results indicate chat it is desirable 
^ to disable the perceptual weighting filter. This is equivalent to setting w{ 2> =\„ This can most 

easily be accomplished if ti and ^ in equaoon (4a) are set equal to zero. The nominai values for 
these variables in the speech mode are 0.9 and 0.6. respectively. 

3 J Synthesis Filter 

, 5 1x1 Figure 2/G.72S. there are two synthesis filters (blocks 9 and 22) with identical coefficients. 

Both Miters are updated by the backward synthesis filter adapter (block 23). Each synthesis alter 
is a 50-th order all-pole filter thai consists of a feedback loop with a 50-th order LPC predictor in 
the feedback branch. The transfer function of the synthesis filter is F(z) = 1/(1 -P{:)}. where Po 
is the transfer function of the 50-th order LPC predictor. 

20 weighted speech vector v(„) has been obtained, a zero-input response vector nni 

will be generated using the synthesis filter (block 9) and the perceptual weighting filter (block 10). 
To accomplish this, we first open the switch 5. i.e., point it to node 6. This implies that the signal 
going from node 7 to the synthesis filter 9 will be zero. We then let the synthesis filter 9 and the 
perceptual weighting filter 10 "ring" for 5 samples (1 vector). This means that we continue the 
filtering operation for 5 samples with a zero signal applied at node 7. The resulting output of the 
perceptual weighting filter 10 is the desired zero- input response vector r{n ). 

Note that except for the vector right after initialization, the memory of the filters 9 and 10 is in 
general non-zero; therefore, the output vector r{n) is also non-zero in general, even though the 
niter input from node 7 is zero. Ln effect, this vector r(n) is the response of the two filters to 
previous gain-scaled excitaoon vectors *(/i-l). *(/i-2), ... This vector actually represents the 
effect due to filter memory up to time (n - 1 ). 

3.6 VQ Targe: Vector Compuxarion 

This block subtracts the zero- input response vector r(n) from the weighted speech vector v(n) 
to obtain the VQ code book search target vector x (n ). 

3 . 7 Backward Symhesis Filter Adap ter 

This adapter 23 updates the coefficients of the symhesis filters 9 and 22. It r**~s the quantized 
(synthesized) s pee ch as injxa and produces a set of synthesis filter coefficients as outpuL Its 
operation is quite similar to the perceptual weighting filter adapter 3. 

A blown-up version of this adapter is shown in Figure 5/G.728. The operation of the hybrid 
windowing module 49 and the Levinsc<i-r>irbui recursion module 50 is exactly the same as their 
45 counter pans (36 and 37) in Figure 4<a)/G.728. except for the following three differences: 

a. The input signal is now the quantized speech rather than the unquantized input speech, 

b. The predictor order is 50 rather than 10. 



35 



40 



50 



55 



16 



EP 0 673 018 A2 



c. The hybrid window parameters are different; N - 35. a = I = 0.992833749. 



Note that the update period is still L = 20, and the white noise correction factor is still 257/256 = 
1.00390625. 

Let P(z) be the transfer function of the 50-ih order LPC predictor, then it has the form 

= - fa,!-. (5) 

where a { 's are the predictor coefficients. To improve robustness to channel errors, these 
coefficients are modified so that the peaks in the resulting LPC spectrum have slightly larger 
band widths. The bandwidth expansion module 51 performs this bandwidth expansion procedure 
in the following way. Given the LPC predictor coefficients S/s, a new set of coefficients a/s is 
computed according to 

at^X'Si , i= 1.2 50. (6) 

where X is given by 

X=|||- = 0.9g82S125 . (7) 
256 

This has the effects of moving all the poles of the synthesis filler radially toward the origin by a 
factor of 3L Since the poles are moved away from the unit circle, the peaks in the frequency 
response are widened. 

After such bandwidth expansion, the modified LPC predictor has a transfer function of 

The modified coefficients are then fed to the synthesis filters 9 and 22. These coefficients are also 
fed to the impulse response vector calculator 12. 

The synthesis filters 9 and 22 both have a transfer function of 

Similar to the perceptual weighting filter, the synthesis filters S and 22 are also updated once 
every 4 vectors, and the updates also occur at the third speech vector of every 4-vector ad a pud on 
cycle. However, the updates are based on the quantized speech up to the last vector of the 
previous adaptation cycle. In other words, a delay of 2 vectors is inuoduced before the updates 
take place. This is because the Levinson-Durbin recursion module 50 and the energy table 
calculator 15 (described later) are computationally intensive. As a result, even though the 
autocorrelation of previously quantized speech is available at the first vector of each 4-vector 
cycle, computations may require more than one vector worth of time. Therefore, to maintain a 
basic buffer size of 1 vector (so as to keep the coding delay low), and to maintain real-time 
operation, a 2-vector delay in filter updaies is introduced in order to facilitate real-time 
implementation. 
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3,8 Backward Vector Gain Adapter 

This adapter updates the excitation gain o(/i) for every vector time index n. The excitation 
gain a(n) is a scaling factor used to scale the selected excitation vector y{n). The adapter 20 takes 
the gain-scaled excitation vector c{n) as its input, and produces an excitation gain a(n) as its 
output Basically, it attempts to "predict" the gain of e(n) based on the gains of e(/i-2), ... 

by using adaptive linear prediction in the logarithmic gain domain. This backward vector gain 
adapter 20 is shown in more detail in Figure 6/G.728. 

Refer to Fig 6/G.728. This gain adapter operates as follows. The I -vector delay unit 67 
makes the previous gain-scaled excitation vector e(n-l) available. The Root-Mean-Square 
(RMS) calculator 39 then calculates the RMS value of the vector e (n-I). Next the logarithm 
calculator 40 calculates the dB value of the RMS of *</i-l). by first computing the base 10 
logarithm and then multiplying the result by 20. 

In Figure 6/G.728. a log-gain offset value of 32 dB is stored in the log-gain offset value holder 
41. This values is meant to be roughly equal to the average excitation gain level (in dB) during 
voiced speech. The adder 42 subtracts this log-gain offset value Prom the logarithmic gain 
produced by the logarithm calculator 40. The resulting offset-removed logarithmic gain 5(/i-l) is 
then used by the hybrid windowing module 43 and the Levinson-Durbin recursion module 44. 
Again, blocks 43 and 44 operate in exactly the same way as blocks 36 and 37 in the perceptual 
weighting filler adapter module (Figure 4(a)/G.728) t except thai the hybrid window parameters are 
different and that the signal under analysis is now the offset-removed logarithmic gain rather than 
the input speech. (Note that only one gain value is produced for every 5 speech samples.) The 



(3H 

hybrid window parameters of block 43 are M - 10, ^ = 20, £ = 4, a = I I = 0.96467863. 

The output of the Levinson-Durbin recursion module 44 is the coefficients of a 10-th order 
linear predictor with a transfer function of 

*<0 = -Ea,-z w . (10) 
1-1 

The bandwidth expansion module 45 then moves the roots of this polynomial radially toward the 
z-plane original in a way similar to the module 51 in Figure 5/G.728. The resulting bandwidth- 
expanded gain predictor has a transfer function of 

*<o (in 

where the coefficients a/s are computed as 

Y 

S, = (0.90625)^ . (12) 

Such bandwidth expansion makes the gain adapter (block 20 in Figure 2/G.728) more robust to 
channel errors. These a, # s are then used as the coefficients of the log-gain linear predictor (block 
46 of Figure 6/G. 728). 
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This predictor 46 is updated once every 4 speech vectors, and the updates take place at the 
second speech vector of every 4-vector adaptation cycle. The predictor attempts to predict 6</i) 

based on a linear combination of 5</i-l), 5(n-2) 5(n-I0). The predicted version of is 

denoted as &(n) and is given by 

6(i) = -Ia I 5(/tW) . (13) 

After 8(/i) has been produced by the log -gain linear predictor 46, we add back the log-gain 
offset value of 32 dB stored in 41, The log-gain lim iter 47 then checks the resulting log-gain value 
and clips it if the value is unreasonably large or unreasonably smalL The lower and upper limits 
are set to 0 dB and 60 dB, respectively. The gain limiter output is then fed to the inverse 
logarithm calculator 48, which reverses the operation of the logarithm calculator 40 and converts 
the gain from the dB value to the linear domain. The gain limiter ensures that the gain in the 
linear domain is in between 1 and 1000. 

39 Codebook Search Module 

In Figure 2/G.728, blocks 12 through 18 constitute a codebook search module 24. This 
module searches through the 1024 candidate codevectors in the excitation VQ codebook 19 and 
identifies the index of the best codevector which gives a corresponding quantized speech vector 
thai is closest to the input speech vector. 

To reduce the codebook search complexity, the 10-biL 1024 -entry codebook is decomposed 
into two smaller codebooks: a 7-bit "shape codebook" containing 128 independent codevectors 
and a 3 -bit "gain codebook* containing 8 scalar values that are symmetric with respect to zero 
(i.e., one bit for sign, two bits for magnitude). The final output codevector is the product of the 
best shape codevector (from the 7-bit shape codebook) and the best gain level (from the 3-bit gain 
codebook). The 7-bit shape codebook table and the 3-bit gain codebook table are given in Annex 
B. 

39 J Principle of Codebook Search 

In principle, the codebook search module 24 scales each of the 1024 candidate codevectors by 
the current excitation gain o(«) and then passes the resulting 1024 vectors one at a time through a 
cascaded filler consisting of the synthesis filter F(z) and the perceptual weighting filter W(x). The 
filter memory is initialized to zero each time the module, feeds a new codevector to the cascaded 
filter with transfer function H'(x) - F (x)W (i). 

The filtering of VQ codevectors can be expressed in terms of matrix-vector multiplication. 
Let y, be the y-th codevector in the 7-bit shape codebook, and let g k be the /-th level in the 3-bit 
gain codebook. Let {h(n)} denote the impulse response sequence of the cascaded filter. Then, 
when the codevector specified by the codebook indices / and; is fed to the cascaded filter H (z), the 
filter output can be expressed as 



where 
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h (2) A (I) h(0) 0 0 

A (3) h{2) h(\) h(0) 0 

A (4) A (3) A(2) Ml) A(0) 



(15) 



The codebook search module 24 searches for the best combination of indices / and j which 
minimizes the following Mean-Squared Error (MSE) distortion. 

0= WxW-ZjW^JwWH^-g.HyjW 1 . (16) 
where x(n) = x{n)fo{n) is the gain-normalized VQ target vector. Expanding the terms gives us 

* D = diw[ II Hn) II 2 - 2 gl i T (n)Hyj + *? II Hy, II 2 ] . (17) 

Since the term ]| x(n) II 2 and the value of a 2 ^) are fixed during the codebook search, 
minimizing D is equivalent to minimizing 

D = -2 it p T (n) yj +s}Ej , (18) 

where 

p{n)^H T x{n) . (19) 

and 

£, = llHy, II 1 . (20) 

Note that £, is actually the energy of the ;-th filtered shape codevectors and does not depend 
on the VQ target vector i(«). Also note thai the shape codevector y ; is fixed, and the matrix H 
only depends on the synthesis filter and the weighting filter, which are fixed over a period of 4 
speech vectors. Consequently, £, is also fixed over a period of 4 speech vectors. Based on this 
observation, when the two filters are updated, we can compute and store the 128 possible energy 

terms j = 0, 1, 2 127 (corresponding to the 128 shape codevectors) and then use these 

energy terms repeatedly for the codebook search during the next 4 speech vectors. This 
arrangement reduces the codebook search complexity. 



For further reduction in computation, we can pre compute and store the two arrays 



and 



(21) 



(22) 



for / = 0, 1 , ... t 7. These two arrays are fixed since & "s are fixed. We can now express D as 

D = -*,/>, ♦<:,£, . (23) 

where / s / =p T (n)y >F . 

Note that once the E J% b i% and Ci tables are precomputed and stored, the inner product term 
Pj=p T (n)yji which solely depends on takes most of the computation in determining D. Thus. 
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the codebook search procedure steps through the shape codebook and identifies the best gam 
index t for each shape code vector y r 

There ire several ways to find the best gain index / for a given shape codevector y,. 

a. The first and the most obvious way is to evaluate the 8 possible D values corresponding to 
the 3 possible values of /. and then pick the index / which corresponds to the smallest D. 
However, this requires 2 multiplications for each (. 

b. A second way is to compute the optimal gain g = P/E ; first, and then quantize this gam g to 
one of the S gain levels {go, ...gt) in the 3-bit gain codebook. The best index i is the index 
of the gain level g t which is closest to g. However, this approach requires a division 
operation for each of the 128 shape codevectors. and division is typically very inefficient to 
implement using DSP processors. 

c. A third approach, which is a slightly modified version of the second approach, is 
particularly efficient for DSP implementations. The quantization of g can be thought of as a 
senes of comparisons between g and the "quantizer cell boundaries", which arc the mid- 
points between adjacent gain levels. Let d, be the mid-point between gain level g t and g,~ { 
that have the same sign. Then, testing '§ < d,V is equivalent to testing m F] <<££,?**. 
Therefore, by using the latter test, we can avoid the division operation and still require only 
one multiplication for each index i. This is the approach used in the codebook search. The 
gam quantizer cell boundaries d t 's are fixed and can be pre computed and stored in a table. 
For the 8 gain levels, actually only 6 boundary values d 0% d } . d 2 . d A% d%. and d 6 are used. 

Once the best indices / and ; are identified, they are concatenated to form the output of the 
codebook search module — a single 10-bit best codebook index. 

132 Operation of Codebook Search Module 

With the codebook search principle introduced, the operation of the codebook search module 
24 is now described below. Refer to Figure 2/G.728. Every time when the synthesis filter 9 and 
the perceptual weighting filter 10 are updated, the impulse response vector calculator 12 computes 
the first 5 samples of the impulse response of the cascaded filter F(z)W(x). To compute the 
impulse response vector, we first set the memory of the cascaded filter to zero, then excite the filter 
with an input sequence ( 1. 0. 0. 0. 0}. The corresponding 5 output samples of the filter are M (0). 

h{\) h (4). which constitute the desired impulse response vector. After this impulse response 

vector is computed, it will be held constant and used in the codebook search for the following 4 
speech vectors* until the filters 9 and 10 are updated again. 

Next, the shape codevector convolution module 14 computes the 128 vectors Hy r ; » 0, 1,2. 
.... 127. In other words, it convolves each shape codevector y h j = 0. 1, 2, 127 with the impulse 

response sequence /i(0). M 4 ), where the convolution is only performed for the first 5 

samples. The energies of the resulting 128 vectors are then computed and stored by the energy 
table calculator 15 according to equation (20). The energy of a vector is defined as the sum of the 
squared value of each vector component. 

Note that the computations in blocks 12. 14. and 15 are performed only once every 4 speech 
vectors, while the other blocks in the codebook search module perform computations for each 
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speech vector. Also note thai the updates of the £. table is synchronized with me updates cf ihe 
synthesis filter coefficients. That is. the new e, table will be used sorting from the tturd speech 
vector of even adapunon cycle. (Refer to the discussion tn Secuon 3.7.) 

The VQ target vector normalization module 16 calculates the gain-normalized VQ target 
vector x(/i) = x(n)/o(/i). In DSP implementations, it is more efficient to first compute \/c(n } . and 
then multiply each component ofz(n) by l>o(/i). 

Next, the time-reversed convolution module 13 computes the vector p</n = H r i<n). This 
operation is equivalent to first reversing the order of the components of i(n ). then convolving the 
resulting vector with the impulse response vector, and then reverse the component orcier of the 
output again (and hence the name "time-reversed convolution"). 

Once £ r b t . and c, tables axe pre computed and stored, and the vector p(n) is also calculated, 
then the error calculator 17 and the best code book index selector 18 wort together to perform the 
following efficient codebook search algorithm. 

a. Initialize D mm to a number larger than the largest possible value of D (or use the largest 
possible number of the DSP's number representation system). 

b. Set the shape code book index y = 0 

c. Compute the inner product ? s -p\n)y r 

d. If p } < o. go to step h to search through negative gains: otherwise, proceed to step e to 
search through positive gains. 

e. If P J < d 0 £ /f set /= 0 and go to step k; otherwise proceed to step f. 

f. If P } < d ! E r set t : = 1 and go to step t otherwise proceed to step g. 

g. UPj < d 2 E r set i = 2 and go to step k; otherwise set i = 3 and go to step k. 

h. UP, > d*E r set / = 4 and go to step k; otherwise proceed to step L 

i. If Pj >d 5 E r set/ = 5 and go to step t otherwise proceed to stepj. 
j. If Pj > d«£ >f set / * 6; otherwise set / = 7. 
L Compute D - - bfj ♦ <\£, 
1. If D <D mm .then set 5^ =6.1^ «i. and ; M = y. 

m. If y < 127. set; ♦ l and go to step 3; otherwise proceed to step n. 

n. When the algorithm proceeds to here, all 1024 possible combinations of gains and shapes 
have been searched through. The resulting / and are the desired channel indices for 
the gain and the shape, respectively. The output best code book index (10-bit) is the 
concatenation of these two indices, and the corresponding best excitation codevector is 
><") = S tum y Jmm The selected 10-bit code book index is transmitted through the 
50 communication channel to the decoder. 
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- ■ : 0 Simulated Decoder 

Although the encoder has identified and transmitted the best code boo Jc index so far, some 
additional tasks have to be performed in preparation for the encoding of the following speech 
vectors. FtreL the best code book index is fed to the excitation VQ code book to extract the 
corresponding best code vector y(n) = g imm y,^. This best codeveoor is then scaled by the current 
excitation gain o(/t) in the gain stage 21. The resulting gain-scaled excitation vector is 
e (/i) =Oln)y{n). 

This vector c(n) is then passed through the synthesis filter 22 to obtain the current quantized 
speech vector j f (n). Note that blocks 19 through 23 form a simulated decoder 8. Hence, the 
quantized speech vector s,(n) is actually the simulated decoded speech vector when there arc no 
channel errors. In Figure 2X3.728. the backward synthesis filter adapter 23 needs this quantized 
speech vector s 0 (n) to update the synthesis filter coefficients. Similarly, the backward vector gain 
adapter 20 needs the gain-scaled excitation vector e (n) to update the coefficients of the log-gain 
linear predictor. 

One last task before proceeding to encode the next speech vector is to update the memory of 
the synthesis filter 9 and the perceptual weighting filter 10. To accomplish this, we first save the 
memory of filters 9 and 10 which was left over after performing the zero-input response 
computation described in Section 3-5. We then set the memory of filters 9 and 10 to zero and 
close the switch 5, i.e.. connect it to node 7. Then, the gain-scaled excitation vector t (*) is passed 
through the two zero-memory filters 9 and 10. Note that since e(n) is only 5 samples long and the 
filters have zero memory, the number of multiply-adds only goes up from 0 to 4 for the 5-sample 
period. This is a significant saving in computation since there would be 70 multipiy-adds per 
sample if the filter memory were not zero. Next, we add the saved original filter memory back to 
the newly established filter memory after filtering e(n). This in effect adds the zero-input 
responses to the zero-state responses of the filters 9 and 10. This results in the desired set of filter 
memory which will be used to compute the zero-input response during the encoding of the next 
speech vector. 

Note that after ■_• filter memory updafe. the top 5 elements of the memory of the synthesis 
filter 9 are exactly the same as the components of the desired quantized speech vector 
Therefore, we can actually omit the synthesis filter 22 and obtain s f (n) from the updated memory 
of the synches* filter 9. This means an additional saving of 50 multiply-adds per sample. 

The encoder operation described so far specifies the way to encode a single input speech 
vector. The encoding of the entire s p eec h waveform is achieved by repeating the above operation 
for every speech vector. 

3JI Synchronization & In-band Signalling 

In the above description of the encoder, it is assumed that the decoder knows the boundaries of 
the received 10-bit codebook indices and also knows when the synthesis filter and the log-gain 
predictor need to be updated (recall that they are updated once every 4 vectors). In practice, such 
synchronization information can be made available to the decoder by adding extra 
synchronization bits on top of the transmitted 16 kbit/s bit stream. However, in many applications 
there is 3 need to insert synchronization or in-band signalling bits as pan of the 16 kbit/s bit 
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5Lrcam. This can be done m the following way. Suppose a synchronization bit is to be inserted 
once ever/ .v speech vectors; then, for even,- y-ch input speech vector, we can search through only 
haif of the shape codebook and produce a 6-bit shape code book index. In this way. we rob one bit 
out of every- .v-th transmitted code book index and insert a synchronization or signalling bit 
instead. 

It is important to note that we cannot arbitrarily rob one bit out of an already selected 7-bit 
shape codebook index, instead, the encoder has to know which speech vectors will be robbed one 
bit and then search through only half of the codebook for those speech vectors. Otherwise, the 
decoder will not have the same decoded excitation codevectors for those speech vectors. 

Since the coding algorithm has a basic adaptation cycle of 4 vectors, it is reasonable to let N be 
a multiple of 4 so that the decoder can easily determine the boundaries of the encoder adaptation 
cycles. For a reasonable value of N (such as 16, which corresponds to a 10 milliseconds bit 
robbing period), the resulting degradation in speech quality is essentially negligible. In particular, 
we have found that a value of V=16 results in little additional distortion. The rate of this bit 
robbing is only 100 bits/s. 

If the above pro -dure is followed, we recommend that when the desired bit is to be a 0, only 
the first half of the snape codebook be searched, i.e. those vectors with indices 0 to 63. When the 
desired bit is a 1. then the second half of the codebook is searched and the resulting index will be 
between 64 and 127. The significance of this choice is that the desired bit will be the leftmost bit 
in the codeword, since the 7 bits for the shape codevecior precede the 3 bits for the sign and gain 
codebook. We further recommend that the synchronization bit be robbed from the last vector in a 
cycle of 4 vectors. Once it is detected, the next codeword received can begin the new cycle of 
codevectors. 

Although we state that synchronization causes very little distortion, we note that no formal 
testing has been done on hardware which contained this synchronization strategy. Consequently, 
the amount of the degradation has not been measured 

However, we specifically recommend against using the synchronization bit for 
synchronization in systems in which the coder is turned on and off repeatedly. For example, a 
system might use a speech activity detector to cum off the coder when no speech were present. 
Each time the encoder was turned oru the decoder would need to locate the synchronization 
sequence. At 100 bits/s, this would probably take several hundred milliseconds. In addition, time 
must be allowed for the decoder state to track the encoder state. The combined result would be a 
phenomena known as front-end clipping In which the beginning of the speech utterance would be 
lost. If the encoder and decoder art both started at the same instanc as the onset of s p eech , then no 
speech will be lost. This is only possible in systems using external signalling for the start-up 
times and external synchronization. 
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4. LD-CELP DECODER PRINCIPLES 

Figure J/C.72S is a block schematic of the LD-CELP decoder. A funaioral description of 
each block is given in the following sections. 

J.l Excitation VQ Codebook 

This block contains an excitation VQ codebook (including shape and gain codebooks) 
identical to the codebook 19 in the LD-CELP encoder. It uses the received best codebook index 
to extract the best codeveaor y(n ) selected in the LD-CELP encoder 

4 2 Gain Scaling Unit 

This block computes the scaled excitation vector e{n) by multiplying each component of vwi) 
by the gain o(n). 

J J Synthesis Fiiur 

This Miter has the same transfer function as the synthesis filter in the LD-CELP encoder 
(assuming error-free transmission). It filters the scaled excitation vector e(n) to produce the 
decoded speech vector Note that in order to avoid any possible accumulation of round-off 
errors during decoding, sometimes it is desirable to exactly duplicate the p<rxxdures used in the 
encoder to obtain s,(n). If this is the case, and if the encoder obtains s fK n) from the updated 
memory of the synthesis filter 9, then the decoder should also compute as the sum of the 
zero-input response and the zero-state response of the synthesis filter 32. as is done in the encoder. 

Backward Vector Gain Adapter 
The function of this block is described in Section 3.8. 
J J Backw ard Synthesis Filter Adapter 

The function of this block is described in Section 3.7. 

Postfiiter 

This block filters the decoded speech to enhance the perceptual quality. This block is further 
expanded in Figure 7/G.728 to show more details. Refer to Figure 7/G.728. The postfiiter 
basically consists of three major pans: (1) long-term postfiiter 71. (2) short-term postfiiter 72. and 
(3) output gain scaling unit 77. The other four blocks in Figure 7/G.728 are just to ^ ku la** the 
appropriate scaling factor for use in the output gain scaling unit 77. 

The long-term postfiiter 71. sometimes called the pitch postfiiter. Is a comb filter with its 
spectral peaks located at multiples of the fundamental frequency (or pitch frequency) of the speech 
to be postfiltered. The reciprocal of the fundamental frequency is called the pitch period. The 
pitch period can be extracted from the decoded speech using a pitch detector (or pitch extractor). 
Let p be the fundamental pitch period (in samples) obtained by a pitch detector, then the transfer 
function of the long-term postfiiter can be expressed as 

= (24) 

where the coefficients j,. b and the pitch period p are updated once every 4 speech vectors (an 
adaptation cycle) and the actual updates occur at the third speech vector of each Captation cycle. 
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For convenience, we will from now on call an adaptation cycle & frame. The derivation of g h b, 
and p will be described later in Section 4.7. 

The short-term postfilter 72 consists of a lOth-order pole-zero filter in cascade with a first- 
order all-zero filter. The lOth-order pole-zero filter attenuates the frequency components between 
form ant peaks, while the first-order all-zero filter attempts to compensate for the spectral tilt in the 
frequency response of the lOth-order pole -zero filter. 

Let a i% i = l, 2 10 be the coefficients of the lOth-order LPC predictor obtained by backward 

LPC analysis of the decoded speech, and let A, be the first reflection coefficient obtained by the 
same LPC analysis. Then, both 5/s and k x can be obtained as by-products of the SOth-order 
backward LPC analysis (block 50 in Figure 5/G.728). All we have to do is to stop the 50th-order 

Levinson-Durbin recursion at order 10, copy Jt, and5 l( 5 2 3, 0( and then resume the Levinson- 

Durbin recursion from order 1 1 to order 50. The transfer function of the short-term postfilter is 

10 

£ HI**:- 1 ) (25) 

i-i 

where 

=5,(0.65)'*,, 1 = 1,2 10, (26) 

5 ( = 5,(0.75/,/ = 1,2 10. (27) 

and 

U = (0.15)*, (28) 

The coefficients 2,'s. £/s, and \i are also updated once a frame, but the updates take place at the 
first vector of each frame (i.e. as soon as a, *s become available). 

In general, after the decoded speech is passed through the long-term postfilter and the short- 
term postfilter, the filtered speech will not have the same power level as the decoded (unfiltered) 
speech. To avoid occasional large gain excursions, it is necessary to use automatic gain control to 
force the postfiltered speech to have roughly the same power as the unfiltered speech. This is 
done by blocks 73 through 77. 

The sum of absolute value calculator 73 operates vector-by-veoot It takes the current 
decoded speech vector sAn) and calculates the sum of the absolute values of its 5 vector 
components. Similarly, the sum of absolute value calculator 74 performs the same type of 
calculation, but on the current output vector 5//1) of the short-term postfilter: The scaling factor 
calculator 75 then divides the output value of block 73 by the output value of block 74 to obtain a 
scaling factor for the current j/n) vector. This scaling factor is then filtered by a first-order 
lowpass filter 76 to get a separate scaling factor for each of the 5 components of The first- 
order lowpass filter 76 has a transfer function of 0.01/(1 -0.99j" 1 ). The lowpass filtered scaling 
factor is used by the output gain scaling unit 77 to perform sample-by-saraple scaling of the 
short-term postfilter output Note that since the scaling factor calculator 75 only generates one 
scaling factor per vector, it would have a stair case effect on the sample -by-sample scaling 
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operation of block 77 if the lowpass filter 76 were not present. The lowpass filter 76 effectively 
smoothes out such a stair-case effect 

4.6.1 Non-speech Operation CCTTT objective test results indicate that for some non-speech 
signals, the performance of the coder is improved when the adaptive postfiltcr is turned off. Since 
the input to the adaptive postfiltcr is the output of the synthesis filter, this signal is always 
available. In an actual implementation this unfiltered signal shall be output when the switch Is set 
to disable the postfilter. 

47 Postfilter Adapter 

This block calculates and updates the coefficients of the postfilter once a frame. This postfilter 
adapter is further expanded in Figure 8/G.728. 

Refer to Figure 8/G.728. The lOth-order LPC inverse filter 81 and the pitch period extraction 
module 82 work together to extract the pitch period from the decoded speech. In fact, any pitch 
extractor with reasonable performance (and without introducing additional delay) may be used 
here. What we described here is only one possible way of implementing a pitch extractor. 

The 10th -order LPC inverse filler 81 has a transfer function of 

MW*l-Ii,^. (29) 

where the coefficients a ( 's are supplied by the Levinson-Durbin recursion module (block 50 of 
Figure 5/G.728) and are updated at the first vector of each frame. This LPC inverse filter takes the 
decoded speech as its input and produces the LPC prediction residual sequence [d(k)\ as its 
outpuL We use a pitch analysis window sire of 100 samples and a range of pitch period from 20 
to 140 samples. The pitch period extraction module 82 maintains a long buffer to hold the last 
240 samples of the LPC prediction residuaL For indexing convenience, the 240 LPC residual 
samples stored in the buffer are indexed as </(-i39). J(-138),-„<*(100). 

The pitch period extraction module 82 extracts the pitch period once a frame, and the pitch 
period is extracted at the third vector of each frame. Therefore, the LPC inverse filter output 
vectors should be stored into the LPC residual buffer in a special order, the LPC residual vector 
corresponding to the fourth vector of the last frame is stored as d(81). d(82)._^/(85), the LPC 
residual of the first vector of the current frame is stored as d(86), d{S7),^d(90), the LPC residual 
of the second vector of the current frame is stored as d (91), d(92)._ti(95) f and the LPC residual of 
the third vector is stored as <*(96),d(97),_d(100). The samples </(-139).d(-138),_.,<*(80) are 
simply the previous LPC residual samples arranged in the correct time order- 
Once the LPC residual buffer is ready, the pitch period extraction module 82 works in the 
following way. First, the last 20 samples of the LPC residual buffer (rf(81) through 4(100)) are 
lowpass filtered at 1 kHz by a third-order elliptic filter (coefficients given in Annex D) and then 
4:1 decimated (i.e. down-sampled by a factor of 4). This results in 5 lowpass filtered and 
decimated LPC residual samples, denoted 5(21), 5(22),-., 5(25), which are stored as the last 5 
samples in a decimated LPC residual buffer. Besides these 5 samples, the other 55 samples 
5(-34). 5(-33).-.. 5(2j0) in the decimated LPC residual buffer are obtained by shifting previous 
frames of decimated LPC residual samples. The i-th correlation of the decimated LPC residual 
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samples are then computed as 



p(/)= £5(ji)5<«-/> (30) 

mml 



for lime lags / = 5. 6, 7 35 (which correspond to pilch periods from 20 to 140 samples). The 

time lag t which gives the largest of the 31 calculated correlation values is then identified. Since 
this time lag t is the lag in the 4:1 decimated residual domain, the corresponding time lag which 
gives the maximum correlation in the original undecimatcd residual domain should lie between 
4x-3 and 4t+3. To get the original time resolution, we next use the undecimated LPC residual 
buffer to compute the correlation of the undecimated LPC residual 

100 

C(/>= J,d(k)d(k-4) (31) 

for 7 lags i = 4x-3, 4t-2 4x+3. Out of the 7 time lags, the lag p 0 that gives the largest correlation 

is identified. 

The time lag p 0 found this way may turn out to be a multiple of the true fundamental pitch 
period. What we need in the long-term postfilter is the true fundamental pitch period, not any 
multiple of it Therefore, we need to do more processing to find the fundamental pitch period. We 
make use of the fact that we estimate the pitch period quite frequently — once every 20 speech 
samples. Since the pitch period typically varies between 20 and 140 samples, our frequent pitch 
estimation means that, at the beginning of each talk spurt, we will first get the fundamental pitch 
period before the multiple pilch periods have a chance to show up in the correlation peak-picking 
process described above. From there oru we will have a chance to lock on to the fundamental 
pitch period by checking to see if there is any correlation peak in the neighborhood of the pitch 
period of the previous frame. 

Let p be the pitch period of the previous frame. If the time lag p 0 obtained above is not in the 
neighborhood of p, then we also evaluate equation (31) for / = p-5..~, £+5, Out of these 
13 possible time lags, the lime lag p t that gives the largest correlation is identified. We then test 
to see if this new lag p , should be used as the output pitch period of the current frame. First we 
compute 

too 

fc-ioF • 

which is the optimal tap weight of a single-tap pitch predictor with a lag of /> c samples. The value 
of Po is then clamped between 0 and 1. Next, we also compute 

too 

Pi=-Io^ . (33) 

Z*(*-? i)^(^-pi) 

which is the optimal tap weight of a single-tap pitch predictor with a lag of p { samples. The value 
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of ?! is then also clamped between 0 and I. Then, the output pitch period p of block 82 is given 
by 

> 0 ifP,<0.4Po 
Px ifP, >0.4po 



P = 



(34) 



After the pitch period extraction module 82 extracts the pitch period p. the pitch predictor tap 
calculator 83 then calculates the optimal tap weight of a single -up pitch predictor for the decoded 
speech. The pitch predictor tap calculator 83 and the long-term postfiUer 71 share a long buffer of 
decoded speech samples. This buffer contains decoded speech samples *X-239). 

iX-237) $A*\ j^<5), where s^X) through *A$) correspond to the current vector of decoded 

speech- The long-term postfilter 71 uses this buffer as the delay unit of the filter. On the other 
hand, the pitch predictor tap calculator 83 uses this buffer to calculate 



P = 



£ sAk-p)*A*-f) 

4 --99 



(35) 



The long-terra postfilter coefficient calculator 84 then takes the pitch period p and the pitch 
predictor tap p and calculates the long-term postfilter coefficients b and g t as follows. 



0 if p < 0.6 
0.15 p if0.6<P£l 
0.15 ifp>l 



1 



1 +b 



(36) 



(37) 



In genera], the closer p is to unity, the more periodic the speech waveform is. As can be seen 
in equations (36) and (37), if p < 0.6. which roughly corresponds to unvoiced or transition regions 
of speech, then 6=0 and g t = I, and the long-term postfilter transfer function becomes HM = 1. 
which means the filtering operation of the long-term postfilter is totally disabled. On the other 
hand, if 0.6 ^P £ 1, the long-term postfilter is turned oru and the degree of comb filtering is 
determined by p. The more periodic the speech waveform, the more comb filtering is performed. 
Finally, if p > 1, then b is limited to 0.15; this is to avoid too much comb filtering. The coefficient 
g t is a scaling factor of the long-term postfilter to ensure thar the voiced regions of speech 
waveforms do not get amplified relative to the unvoiced or transition regions. (If g t were held 
constant at unity, then after the long-term postfiltering. the voiced regions would be amplified by a 
factor of 1+6 roughly. This would make some consonants, which correspond to unvoiced and 
transition regions, sound unclear or too soft) 

The short-term postfilter coefficient calculator 85 calculates the short-term postfilter 
coefficients 5 ( *s. b.'s, and n at the first vector of each frame according to equations (26). (27), and 
(28). 



50 



55 



29 



EP 0 673 018 A2 



- : Ouqpu: PCM Format Conversion 

Thus block converts the 5 components or :he decoded speech vector into 5 corresponding a- 
law or u-law PCM samples and output these f PCM samples sequentially ai 1 25 us tune intervals. 
Note that if the internal linear PCM format has been scaled as described in section 3.1. 1, the 
ir.verse scaling must be performed before conversion EOA-law or u-law PCM. 

5. COMPUTATIONAL DETAILS 

This section provides the computational details for each of the LD-CELP encoder and decoder 
elements. Sections 5.1 and 5.2 list the names of coder parameters and internal processing 
variables which will be referred to in later sections. The detailed specification of each block in 
Figure 2/G.728 through Figure 6/G.728 is given in Section 5.3 through the end of Section 5. To 
encode and decode an input speech vector, the various blocks of the encoder and the decoder are 
executed in an order which roughly follows the sequence from Section 5.3 to the end. 

5.7 Description of Basic Coder Parameters 

The names of basic coder parameters are defined in Table 1/G.728. In Table l/G.728, the first 
column gives the names of coder parameters which will be used in later detailed description of the 
LD-CELP algorithm. If a parameter has been referred to in Section 3 or 4 but was represented by 
a different symbol, that equivalent symbol will be given in the second column for easy reference. 
Each coder parameter has a fixed value which is determined in the coder design stage. The third 
column shows these fixed parameter values, and the fourth column is a bnef description of the 
coder parameters. 



30 



EP0 673 018 A2 



Table 1/G.72S Basic Coder Parameters of LD-CELP 



Nam: 


Equivalent 
Symbol 


» dJUC 


L/cscnpnon 


AGCFaC 




0 99 


A vianrarioo ci rnnrrollino f^* f nr 


FAC 




— -J — / «— J VJ 


R arvH u/ i Hi h Mn^fuion factnr of ^vnLh£5i_£ filter 


FACG? 






Ddl IUW 1UU 1 tJk }M1U IVJ11 ItflLUJl Ul IWg'gOlII }J 1 \m*A 1 I IvA 


DINCSV 






RftriTwrv^al of vw inr dimension 


IDLM 






VM*rrr Him^ncirm ^TT'f "jnew hlrtfk ci7^\ i 


GOFF I 






KP DELTA 




Q 




KPNGN 




^0 
— L/ 


Minimum nif/*h n^rifvf fumnlr'c) 


KPMAX 




140 


Vfiii mtim niirh neriod /tamnl^^l 


LPC 




50 


^vntri^is filter nrder 


LPCLG 




10 


Lric-ffain nrwiirtor order 
uu^ gain pmuvtui uiuu 


LPCA' 




in 


P» n - f»iK 1 1 a 1 a/^tohn n o filter rwr^T 


NCVrD 




... * *-0 




NTRSZ 








. NG 




fi 

o 


("lain c/v^hfvJr ct7^ frwi nf tr^iin l^v^lc^ 


NONR 




J J 


Nrt nf tvmi • rfH" i ri'vi v£ wifvW^w omnlM frtr cvnthesis filter 


NONRLG 




«.U 


p*o. oi rKXi- recursive wuxjow samples tor iog-gain prcaxcior 


NONRW 






p*o. oi iKjn-rcciiriJvc winoow samples iot wci^jiung tuiu 


NPWSZ 




too 


Pitch analysts window size (samples) 


NUPDaTE 




4 


Predictor update period (in terms of vectors) 


PPFTH 




0.6 


Tap threshold for turning off pitch posriilier 


PPFZCF 




0.15 


Pitch postfilter zero controlling factor 


SPFPCF 




0.75 


S non-term posxfiUer pole controlling factor 


SPFZCF 




0.65 


Short-term postfilter zero controlling factor 


TAFTH 




0.4 


Tap threshold for fundamental pitch replacement 


TTLTF ! 


0.15 


Spectral tilt compensation controlling factor 


WNCF I 


257/256 


White noise correction factor 


WPCF 




0.6 


Pole controlling factor of perceptual weighting filter 


WZCr 


i Yi 


0.9 


Zero controlling factor of perceptual weighting filter 



52 Description of Internal Variables 

The internal processing variables of LD-CELP are listed in Table 2X3.728, which has a layout 
similar to Table I A3. 728. The second column shows the range of index in each variable array. The 
fourth column gives the recommended initial values of the variables. The initial values of some 
arrays are given in Annexes A, B or C It is recommended (although not required) that the 
internal variables be set to their initial values wten the encoder or decoder just starts running, or 
whenever a reset of coder stales is needed (such as in DCME appUcaxions). These initial values 
ensure thai there will be no glitches right after start-up or resets. 

Note that some variable arrays can share the same physical memory locations to save memory 
space, although they are given different names in the tables to enhance clarity. 

As mentioned in earlier secrjons. the processing sequence has a basic adaptation cycle of 4 
speech vectors. The variable ICOLTNT is used as the vector index. In other words. ICOUNT = n 
when the encoder or decoder is processing the /i-th speech vector in an adaptation cycle. 
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Table UG'ZS LD-CELP Interoal Processing Variables 



Nam: 

! 


Array Index 
Range 


Equivalent • 
Symbol ] 


Inuui 
Value 


sen pa on j 


A 


I to LPC* I 




1.0.0.,. 


Synthesis alter coefficients 


AL 


I io3 


j 


Annex D 


1 kHz lowpa&s filter denomiruior coeff. 


AP 


I to 11 


-J, _ t | 


1.0.0.... 


Short-term postfilter denooiinaior coetf. 


APF I 


Uo 11 




1.0.0.... 


1 Oth -order LPC filter coefficients ; 


ATMP 


1 to LPC* I 






Temporary buffer for synthesis filter coeif. i 


AWT 


1 to LPCV/^l 




1.0.0.... 


Perceptual weighting filter denominator coeff. 




I to LPOV+l 




L0.0.... 


Perceptual weighting filter numerator coeff. 


' AWTTMP 


t loLPCW^l 




L0.0.... 


Temporary buffer for weighting filter coeif. 


I A2 


I to 1 1 


-A 1 


l.0.0._. 


Short-term postfilter numerator coeff. 


' B 


1 


I) 


o 


Long-term postfilter coefficient 


BL ! 


1 to 4 




Anne* D 


1 kHz lowpass filter numerator coeff. 


i DEC . | 


-34 to 25 


din) 


O.O—.O 


4:1 decunaied LPC prediction residual 




-139 to 100 


J(k) 


0.0„...0 


LPC prediction residual 






fin} 


0,0 0 


ft^iin.cf al^H ftr Hal im vfiCtQf 




I to LPC-*- 1 


X'-' 


.Annex C 


Svntftesis filter BW broadening vector 








Annex C 


Gain nrftilictQf BW hragyt^nino vector 






b 


Annex B 


2 tunes train levels In ?2in code book 




i 
I 


Of 1 ) 




P_i£itafiflft gain 


VJ o 


1 NO. 1 


d 


.Annex B 


Mid-point berween adjacent gain levels 


GL 


| 


Si 


1 


Long-term postfilter scaling factor j 


I HP 


t f ft i ?n 0 ♦ i 




1.- 1,0.0.... 


Log -gain linear predictor coeff. ' 


! GPTMP 








temp, array for log-gain linear predictor coetf. j 








Annex B 


Gain levels in tne sain codebook i 








Annex B 


Squares of gain levels in gain code book | 


; G STATE 


1 to LPCLG 


5</i ) 


•32.-3Z— -32 


Memory of the log-gain linear predictor 


GTMP 


1 IO 4 




-32.-32.-32.-32 


Temporary log-gain buffer 


I H 


i to rnrM 


n (n) 


t. 0.0 .0.0 


Impulse response vector of F (z)W (:) 


ICHAN 








Best code book inriri to be aansmitted 


ICOUNT 








Speech vector counter (indexed from 1 to 4) 


IG 




i 




Best 3-bit gain codebook index 


CP 






LPLNTP* 


Address pointer to LPC prediction residual j 


IS 




J 




Best 7-bit shape codebook index 


KP 




P 




Pitch period of the current frame 


KPl 




P 


50 


Pitch period of the previous frame 


I PN 


I to IDEM 


Pin) 




Corre taboo vector for codebook search 


j PTAP 


1 






Pitch predictor tap computed by block 83 


R 


1 loNR+r 






Autocorrelation coefficients 


RC 


I io NR* 






Reflection coeff.. also as a scratch array 


RCTMP 


1 to LPC 






Temporary buffer for reflection coeff. 


R£XP 


1 to LPC* I 




QXL-X) 


Recursive part of autocorrelation, syn. filter 


REJCPLG 


1 toLPCLG+1 




0,0—0 


Recursive pan of autocorrelation, log -gain pred, 


REXPW 


1 toLPCW+l 




0.0— .0 


Recursive part of autocorrelation, weighting filter 



• NR = MixO-PCWXPCLG) > IDEM 
IP [NTT = NPV/SZ-NFRSZ+tDCM 
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Tabic 1 G-T2S LD-CELP Internal Processing Variables 'Continued) 





Na/nc 


.Array index 


Equivalent 
Symbol 


initiai 
Value 


Description 


10 




1 io LPC* I 






Temporary buffer for autocorrelation zoc::. 






: to fDIM 




0.0... .0 


Uniform PCM input speecn vector 




SB 


i to 105 




d.o„...o 


Buffer for previously quantized speech 




SBLG 


1 IO 34 




0.0—0 


Buffer for previous log-gain 




SBW 


1 to 60 




0.0....0 


Buffer for previous input speech 




SCALE 


I 






Unaltered poszfilter scaling factor 


15 


SCAJ-EFU. 


1 




I 


Lowpass filtered ponnher scaling factor 




SD 


1 io I DIM 






Decoded speech buffer 




SPF 


I to IDiM 






PosuSltered speech vector 




SPFPCFV 


t EO 1 1 


SPF PC f " 


Annex C 


Short-term posc&lter pole controlling ^etwr 




SPFZCFV 


I to 1 1 


SPFZCF' * ; 


Annex C 


Short-term poscfilter zero controlling vecior 




SO 


1 


s,ik ) 




A- law or u-law PCM input speech sampie 


20 


SL 


I 


s.ik) 




Uniform PCM input speech sample 




ST 


-239 to IDIM 


s T in ) 


0.0... .0 


Quantized speech vector 




STATE LPC 


1 to LPC 




0.0....0 


Synthesis filter memory 




STLPCI 


I to 10 




0.0—0 


LPC inverse filter memory 




STLPF 


I to 3 




0.0.0 


1 kHz lowpass filter memory 




STMP 


1 to 4-LDCM 




0.0—0 


Buffer for per. wt. fijirr hybrid window 


25 


STPFFTR 


I to 10 




0,0—0 


S non-term postfilter memory, all-zej© sec*jon 




STPR1R 


10 




0.0-...0 


Short -term postfilter memory, all -pole section 




SUMFIL 


t 






Sum of absolute value of postfiltered speech 




SUMUNFTL 


1 






Sum of absolute value of decoded speech 




sw 


1 to IDIM 


v(n) 




Perceptually weighted speech vector 




target 


1 to IDfM 


ztn)j(n ) 




(gain-normalized) VQ target vector 


30 


TEMP 


1 to fDIM 






scratch array for temporary working space 




TTLTZ 


1 


H 


0 


Short-term postfilter tilt-compensaoon coeff. 




WFIR 


1 toLPCW 




0.0—0 


Memory of weighting filter 4, all -zero pom on 


i WIER 


1 toLPCW 




0.0—0 


Memory of weigh ring filter 4. all-pole portion 




WNR 


I to 105 




Annex A 


Window function for synthesis filter 


I WNRLG 


1 to 34 




Annex A 


Window function for log-gain predictor j 


35 


WNRW 


I to 60 


*.(*> 


Annex A 


Window function for weighting filter 




WPCFV 


1 to LPCW+i 




Annex C 


Perce praal weighting filter pole controlling vector : 




ws 


I to 105 






Wort Space array for uuerrnediatt variables 




WZCFV 


1 toLPCW+l 




Annex C 


ftmxuuul weighting filter zero controlling vec:or 




Y 


1 to EDCM-NCWD 




Annex B 


Shape code book array 




Y2 


I toNCWD 




Energy of jj 


Energy of convolved shape codevector 


40 


YN 


I to IDIM 


v(it) 




Quantized excitation vector 




ZCRWFER 


I toLPCW 




OIL-.O 


Memory of weighting filer 10. all-zero portion 




22RWIIR 


1 toLPCW 




0.0„j0 


Memory of weighting filler 10. all-pole poruon 



It should be noted thaL for the convenience of Le^in^n-Durbin recursion, the first element of 
A. ATMP. AWP. AWZ, and GP arrays are always 1 and never get changed, and, for f £2, the Mh 
elements are the (/'-l)-ih elements of the corresponding symbols in Section 3. 

In the following sections, the asterisk * denotes arithmetic mulDplicaoon. 
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5 J Input PCM Format Conversion 'black 1 , 
Input: SO 
Output. SL' 

Function: Convert A-iaw or M-law or 16-bit linear input sample to uniform PCM sample. 

Since the operation of this block is completely defined in CCITT Recommendauons G.721 or 
G.71 1, we will not repeat it here. However, recall from section 3.1.1 that some scaling may be 
necessary to conform to this desenpuon's specification of an input range of -4095 to +4095. 



5.4 Vector Buffer (block!) 
Input: SU 
Output: S 

Funcuon: Buffer 5 consecutive uniform PCM speech samples to form a single 5-dimensional 
speech vector. 



5 J Adapter for Perceptual Weighting Filter (block 3. Figure 4 (ab'G.728) 

The three blocks (36. 37 and 38) in Figure 4 (a)/G.728 are now specified in detail below. 
HYBRID WINDOWING MODULE (block 36) 

Input: STMP 
Output: R 

Function: Apply the hybrid window to input speech and compute autocorrelation coefficients. 

The operation of this module is now described below, using a "Fortran-like" style, with loop 
boundaries indicated by indentation and comments on the right-hand side of " I \ The following 
algorithm is to be used ooce every adaptation cycle (20 samples). The STMP amy holds 4 
consecutive input s p eec h vectors up to the second speech vector of the current adaptation cycle. 
That is, STMP(1) through STMP(5) is the third input speech vector of the previous adaptation 
cycle (zero initially), STMP(6) through STMPUO) is the fourth input speech vector of the 
previous adaptation cycle (zero initially). STMP(li) through STMP(15) is the first input speech 
vector of the current adaptation cycle, and STMP(16) through STMP(20) is the second input 
speech vector of the current adaptation cycle. 
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1 = LPCV-N'F?S2 

2 = L PCVT - N*C N*r. V 



:c-pu:e s crr.e ::nstin:s car :e 
pre:cr.puted and stereo ir. merr.c 
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75 



r N=1.2 N2 . dc the -ex: lir.e 

S=w • ::> =5=w *N-NF?.SI; | snift the old signal buffer; 

r :: N=;,2 NFRS2. to the next l;r.e 

£5W(N*2.^'- =£TMP; V- | snift in the new signdl; 

I S3W(N3) is the newest sample 

k. = : 

.-or N = K3.N3-1 3.2.1. dc the next 2 lines 

WS v N> =S3W(N'i •WK?.Wf , multiply the window function 

K = K*1 



20 



25 



Fc r 1 = 1,2, ... L ?CV * 1 . to the next 4 lines 

Fcr N = L ?CV* l , L r C~" - 1 S*l , do tr.e next line 

tmp = tm?-ws :n; *ws,'N*i-:) 

R ?W < r » = ( 1 / 2 ) * ?. £X ? w ( I j - TV. ? ! up-date the recursive component 

For 2 = 1.2 -?C~«-1. do the next 3 lines 

R{ I) = R£XPW( 2 ) 

For N = N1*I.N1*2 N3, do the next line 

R { I ) =R ( I ) *W5 * N) *ws i :N* 1 - I J I add the non-recursive component 



P. 1 ) =F. 1 ) *WKCF 



I white noise correction 



LEV1NSON-DURBIN RECURSION MODULE (block 37) 
Input: R (output of block 36) 

35 

Outpuc AWZTMP 

Function: Convert autocorrelation coefficients to linear predictor coefficient 

40 This block is executed once every ^-vector adaptation cycle. It is done at ICOUNT-3 after the 
processing of block 36 has finished. Since the Levinson-Durbin recursion is well-known prior an. 
the algorithm is given below without explanation. 
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10 - - = * * 

al ? k a = ?. ■: ; > *r < 2 : •?.:: , : : 

If ALPHA < ;, LA = EL : Abe r t 

■:: MIKC = :.:,; LPCV, zz the :':i;;w:n; 

T5 stjm=:. 

?cz Z?~l.Z.2 MINC. cc t.ne nex: 2 ..r.es 

^:=m:nc-:?-2 

£vm=suv-= ' si , * aw i tv. ? i f ■ 



20 



25 



30 



40 



50 



r Z \ K INC : = - S : -"H / AL ? HA ; Ret lerc.cr. zz&ii. 

.''C-i r M I NC , L ■»■ 1 I 

I P = 2 , 3 . 4 MH. dc :."e next 4 Iir.es 

IssMlNC- :p*2 

ATrAWZTMPt IP) *RC£MINCI *AWZTMP< IB) t 

AWZTMr • 23 ) = AWZTMF < 15} *RC [MINC) *AWZTMP( I?) I Predict:: tee : f 
AWZTM? ■.:?.» =AT I 

AW2TMP ( MINC * 1 J =RC [ MINC J I 

ALPHA = AL?HA-?.CiMINCj *SCM i Prediction residua; -r.er 

If ALPHA S 0. go to LABEL i Alport if ill-cond;:_cr.*d 

r-?;ed: tr.e Above for the next MINC 



i Program terminate s - c rr.<i 
txit this program I if execution prcceecs to 

! here . 

LASiL : If prcgraa proceeds to here, ill-conditioning had happened. 

then, skip block 38. do not update the weighting filter coe C t ic ler.t s 
35 .That is. use the weighting filter coefficients of the previous 

acaptiticn cycle.) 



WEIGHTING FILTER COEFFICIENT CALCULATOR (block 38) 



Input: AVTZTMP 
45 Output: AWZ. AWP 

Function: Calculate the perceptual weighting filter coefficients from the linear predictor 
coefficients for input speech. 



This block is executed once every adaptation cycle, tt is done at IC0UNT=3 after the processing 
of biock 37 has finished. 
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do -he next line 

I) * AW7TM? ( : : 



I Nu.T.e racer zoett . 



5.6 Backward Synthesis Filter Adapter (block 23. Figure 5JG.728) 

The tftree blocks (49. 50. and 51) in Figure 5/G.728 are specified below. 

HYBRID WINDOWING MODULE 0>lock 49) 

Input: STTMP 
OucpuL RTNtP 

Function: Apply the hybrid window to quantized speech and compute autocorrelation 
coefficients. 

The operation of this block is essentially the same as in block 36. except for some 
subsuruoons of parameters and variables, and for the sampling instant when the autocorrelation 
coefficients are obtained. As described in Section 3. the autocorrelation coefficients are computed 
based on the quantized speech vectors up to the last vector in the previous 4. vector adaptation 
cycle. In other words, the autocorrelation coefficients used in the current adaptation cycle are 
based on the information contained in the quantized speech up to the last (20-th) sample of the 
previous adaptation cycle. (This is in fact how we define the adaptation cycle.) The STTMP array 
contains the 4 quantized speech vectors of the previous adaptation cycle. 
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_ = _ PC - N'r ?.S Z s;r,e ::r.s:ar.:s ■ c *. n z e 

Z =L?C-N , C:. P P pr^r reputed -r.d sec red in rr.eno: 

Frr : - L . I N2. dc :he r.rx: lir.e 

^ - *• = 5 5 i N*NF?.5Z i | sn. ; c -he old signal buffer; 

~ z ~ .'■'='.. - NFRSZ. do che next line 

= £:N2-N.' *5TTM?(Ni I shift ir. the new signal; 

t 3c<N3; is the newest sair.ple 

K - 1 

Fcr N =.\ T 3 . N " - 1 3,2.1, dc * r.e ?.exc 2 lines 

WS cm; =5B :S*) *WNR(K; i roulcip.y tne v ir.de w func: : :n 



- = - . - L?-w*i. do cne next 4 1 ines 

Fcr N'sLPC-l. Z.PC*2 Ml, do the next line 

TM? = 7MP~WS «'N) •WSfN-l-Ii 
= S:<? ( I ; = < 3 / 4 ) 'REXP { I ) *TMP | upcdce the recursive component 

For 1 = 1,2 LPC*1. do the next 3 line* 

P.TMFi I J =REXP( I) 

For NsNl*i.Nl*2 N3 , do the next line 

RTMP(I) =R1MP( I ) ♦WS(N) •WSiN^l-Ii 

I add che r.cn- recurs ive cempenen: 

RTMP ; 1 ) sRTMP [ 1 ) *WNCF | white noise correction 



LEVTNSON-DURBIN RECURSION MODULE (block 50) 

Input: RTMP 
Output; ATMP 

Function; Convert autocorrelation coefficients to synthesis filter coefficients. 

The operation of this block is exactly the same as in block 37. except for some substitutions of 
parameters and variables. However; special care should be taken when implementing this Mock. 
As" described in Section 3. although the autocorrelation RTMP array is available at the first vector 
of each adaptation cycle, the actual updates of synthesis filter coefficients will not take place until 
the third vector. This intentional delay of updates allows the real-time hardware to spread the 
computation of this module over the first three vectors of each adaptation cycle. While this 
module is being executed during the first two vectors of each cycle, the old set of synthesis filter 
coefficients (the amy "A") obtained in the previous cycle is still being used. This is why we need 
to keep a separate array ATMP to avoid overwriting the old "A" array. Similarly. RTMP. 
RCTMP. ALPHATMP. etc. are used to avoid interference to other Levinson-Durbin recursion 
modules (blocks 37 and 44). 
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:erc sigr.ai . 



10 



r. rry ? ■ : . = - rtm? . 2 ) / rtm Ft ; : 

A TV. ? ; 1 ; s i . 
ATMP ( 2 ) =P.CTV? ( 1 ) 
A2.?HATMP = RTMr ( 1: *RTM? ■ 2 ! 1 
if AiPKATM? SO. go tc U 



rst-order pred i::c: 



I Aisorc if ill -concic ::r.s: 



75 



20 



25 



dc -r.e fcllcwir.g 



rcr m:nc=2,3.4 lpc. 

::: IF=1.2.3 MINC. dc tr.e 

S"JM = S*JM*RTMP (Nl ! * ATVP (IP) 



I 



Reflection coe:f 



r 27M? { y. I NC ' = - 5 LTM . Ai, ? K ATM ? 

m:-:=minc.*:-i i 

Fc: IP=2.3,4 KH, do ;he n«x: 4 lines 

IS=MXNC-IF*2 

AT=ATMP( IP) *F.CTMPiM:NC) *ATMP ( IB) ! 

ATM? (IB J sATMP (19 1 -RCTMP {MINC > *ATMP * IP ? i Update predictor cce 
A7V.F (IP? =AT ! 



ATM? (MINC-i) =RCTM?(MINC: ! 

AL?HATSfF=ALFHATOP*RCTMF(MINC» *SUM I Pred. residual energy. 

If ALFKATMP S 0. go to LAS £2. I Abort if ;il-cond::;op.ec . 

i 

Repeat the above for the next MINC 

I Recursion completed normal 
ixi: this program f if execution proceeds to 

I here. 

LABEL: If program proceeds to here, i 1 1 -condit ioning had happened, 

then, s<ip block 51, do not update the synthesis filter coefficients 

That is, use the synthesis filter coefficients of the previous 
adaptat ion cycle . ) 



40 BANDWIDTH EXPANSION MODULE (block 51) 

Input: ATMP 
Output: A 

45 

Function: Scaie synthesis filter coefficients to exparai the band widths of spectral peaks. 

This block is executed only once every adapcarion cycle. It is done after the processing of block 
50 has onished and before the execution of blocks 9 and 10 at IC0UNT=3 take place. When the 
50 execution of this module is finished and I COUNTS 3. then w c copy the ATMP array to the "A" 
array to update the filter coefficients. 
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5 




15 



20 



5.7 Backward Vector Gain Aaxpter (block. 20. Figure 6/G./28) 

The blocks in Figure 6/C.728 are specified below. For implementation efficiency, some 
blocks are described together is a single block: (they are shown separately in Figure 6/G.72S jusr 
co explain the concept). .Ml blocks in Figure 6/G.72S are executed once every speech vector, 
except for blocks 43. 44 and 45. which are executed only when rcOUNT=2. 

1-VECTOR DELAY, RMS CALCULATOR, AND LOGARITHM CALCULATOR 

(blocics 67, 39, and 40) 



Input: ET 
Output: ETR.MS 

Funcncn: Calculate the dB level of the Root-Mean Square ("RMS) value of the previous gain- 
scaled excitation vector. 

When these three blocics are executed (which is before the VQ code book: search), the ET array 
contains the gain-scaled excitation vector determined for the previous speech vector. Therefore, 
the 1 -vector delay unit (block 67) is automatically executed. (It appears in Figure 6/G.728 just to 
enhance cianry.) Since the logarithm calculator immediately follow the RMS calculator, the 
square root operation in the RMS calculator can be implemented as a "divide -by -two" operation to 
the ourpxit of the logarithm calculator. Hence, the output of the logarithm calculator (the dB 
value) is 10 • tog 10 ( energy of ET / IDIM ). To avoid overflow of logarithm value when ET = 0 
(after system initialization or reset), the argument of the logarithm operation is clipped to 1 if it is 
too small Also, we note that ETRMS is usually kept in an accumulator, as it is a temporary value 
which is immediately processed in block 42. 



etrms * rr ( i ) • rr : i ) 

For K = 2 . 3 IDIM, do the n«xc line 

ETRMS = ETRMS + ET(K) *ET(K) 

ETRMS = ETRMS *CIMINV 

If ETRMS < 1., set ETRMS = I. 

ETRMS = 10 • \o% i0 { ETRMS ) 



I 

I Compute energy of ET. 

I 

I Divide by IDIM. 
I Clip to avoid log overflow. 
1 Compute dB value. 
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LOG-GAIN OFFSET SUBTRACTOR (block 42) 

Input: ETRMS. GOFF 
Output: GSTATEO) 

Function: Subtraa the log-gain offset value held in block 4 i from the output of block 40 idB 
gain level). 

GST ATE ( 1 ) = ETRMS - GCFF 



HYBRID WINDOWING MODULE (block 43) 

Input: GTMP 
Output R 

Function: Apply the hybrid window to offset-subtracted log-gain sequence and compute 
autocorrclanon coefficients. 

The operation of this block is very similar to block 36. except for some substitutions of 
parameters and variables, and for the sampling instant when the autocorrelation coefficients are 
obtained. 

An important difference between block 36 and this block is that only 4 (rather than 20) gain 
sample is fed to this block each time the block is executed. 

The log-gain predictor coefficients are updated at the second vector of each adaptation cycle. 
The GTMP array below contains 4 offset-removed log-gain values, starting from the log-gain of 
the second vector of the previous adaptation cycle to the log-gain of the first vector of the current 
adaptation cycle, which is GTMPO). GTMP(4) is the offset-removed log-gain value from the first 
vector of the current adaptation cycle, the newest value. 
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- * *. . ac cr.e r.exc .me 

33I>3;N) =53LC. N«-N1/FCA": I shift the old signal buffer- 

r*r N=*,Z ?«"VPCA7£, cc Che next line 

! S3LG(N3) is the newest sair.p 1 e 
N=N'3.M3-; 3.2.1. do the next 2 lines 

WS(N).saLG(N)-WNRLG(K! ' multiply the window function 

K =K «- 1 

For 1 = 1,2 LPCl^l, do Che nex: 4 lines 

TMPsO. 

For N-LPCLC*:. LFCLG*: Nl, co che next line 

TM?=TMF»WS (N) -WSIN*!-:) * 
P.EXPLS<I) a (3/4)*REX?LG(I)*™? I update the recursive component 

F °i" 1 = 1.2 LPCLC*1. do the next 3 lines 

P. { Ii =REXPLG t 'I) 

For N=Nl*l,Nl+2 N3, do che next line 

RC)sRC)*WS(N)'WS(N*i-I) ! add che ncn -recurs ive component 

R 1 ! =R( 1} •WNCF I white noise correction 



LEVTNSON-DURBIN RECURSION MODULE (block 44) 

Input: R (output of block 43) 
Output: GPTMP 

Function; Conven autocorrelation coefficients to log-gain predictor coefficients. 

The operetta* of this block ii exactly the same as in block 37, except for the substitutions of 
parameter* md variables indicated below, replace LPCW by LPCLG and AW2 by GP. This 
block is exaofltd only when ICOUNT-2, after block 43 is executed. Note that as the first step, 
the value of ROPCLG+1) will be checked. If it is zero, we skip blocks 44 and 45 without 
updating the log-gain predictor coefficients- (That is. we keep using the old log-gain predictor 
coefficients determined in the previous adaptation cycle.) This special procedure is designed to 
avoid a very small glitch that would have otherwise happened right after system initialization or 
reset In case the matrix is ill<onditioned. we also skip block 45 and use the old values. 

BANDWIDTH EXPANSION MODULE (block 45) 



tnpuc: GPTMP 
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Output GP 

Funcaon: Scale log-gain predictor coefficients to expand the bandwidths of spectraJ peaks. 
This block is executed only when ICOUNT=2. after block is executed. 

Fcr 1=2,3 L?C^C- L , do Che next 1 ir.e ! 

G?( I J =F ACGFV( Z j •GPTMP1 I } t scale coef f . 



LOG-GAIN LINEAR PREDICTOR (block 46) 

Input: GP, G STATE 
Output: GAIN 

Function: Predict the current value of the offset-subtracted log-gain. 

GAIN = 0 . 

Fcr I=LCLPC, LPCLG- 1 3.2, do the next 2 lines 

GAIN = GAIN - GP( !•!) 'CSTATEi I) 
G STATE ( I ) = GSTATE t I - 1 } 

GAIN = GAIN - GP ( 2 ) 'GSTATE { 1 ) 

LOG -GAIN OFFSET ADDER (between blocks 46 and 47) 

Input: GAIN. GOFF 
Output GAIN 

Function: Att (be log-gain offset value back to the log-gain predictor output. 

GAIN n GAIN ♦'OOFT 



LOG-GAIN UMITER (block 47) 

Input: GAIN 
Output: GAIN 

Function: Limit the range of the predicted logarithmic gain. 
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: f gain < : 
: f gain* - s: 



rasper.- ~ = 



10 



INVERSE LOGARITHM CALCULATOR (block AS) 



15 



Input: GAIN 
Output: GAIN 

Function: Convert the prcdicied logarithmic gam (in dfi) back to linear domain. 



GAIN = 10 



20 



5.8 Perceptual Weighting Fii:er 



25 



30 



PERCEPTUAL WEIGHTING FILTER 0>*ock 4) 

Input: S. AWZ. AWP 
Output SW 

Function: Filter the input speech vector to achieve perceptual weighting. 



35 



40 



45 



For K = 1 . 2 ICIM, dc the following 

SW(K) = S(K) 

For JsLPCW, LPCW-1. . . . , 3, 2, do the n«xt 2 lines 
SW(K) = SW(K) * WFIRU) •AWZ(J^l) 
WFIR(J) » WFIR(J-l) 

SW(K) t « SW{K) * WFIR(l) *AWZ(2) 
WFIRU) - S(K) 

Mr J«LPCW,LPCV-1 3,2, do the next 2 line* 

« SW(K)*SW(K)-WIIR(J) »AWP( J+l) 

WIIR(J)»WIIR(J-1) 

SW(K)«SW(K)-WIIR(1) *AWP(2) 
WIIR(l) -SW(K) 

Repeat the above for the next K 



I All-zero part 
I of the filter. 

t Handle last one 
I differently. 



I All-pole part 
I of the filter. 

I Handle last one 
I differently. 
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: .} C ' cmputction ofZ^ro-tnpu:RespcfiS€ Vector 

Section 2.5 explains how a "zero-input response vector" r{n) is computed by blocks 9 and 10. 
Now the operation of these two blocks during this phase ;s specified below. Their operauon 
during the "memory update phase" will be desenbed later. 

SYNTHESIS FILTER (block 9) DURING ZERO-INPUT RESPONSE COMPUTATION 

Input: A, STATELPC 
OucpuL TEMP 

Function: Compute the zero- in pot response vector of the synthesis filter. 

For K = 1.2 IDIM. do the following 

T2*F(K)=C. 

For J=L?C , LPC - 1 3.2, do the next 2 lines 



PERCEPTUAL WEIGHTING FILTER DURING ZERO-INPUT RESPONSE COMPUTATION 



TEWPtK) =TEMP(K) -STATELPC (J) »A( J* 1 ) 
STATELPC (J| =STATELPC( J- II 



Mult iply-add . 
Meroory shift. 



T^MF ( ;<) =TE>fP ( K } -STATELPC ( 1 ) *A(2) 
STATELPC ( 1) =TEXP(Ki 



Handle Use or.e 
di £ f erenc ly . 



Repeat the above for the next K 



(block 10) 



Input: AWZ, AWT, ZIRWFIR. ZERWIIR, TEMP computed above 
Output Z2R 

FunctkHE^Compute the zero-input response vector of the p e mrpcu fci weighting filter. 
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15 



20 



25 



35 



TM r = T r \ K ■ 

":: J = wrCv;.:?r*-;, dc :r t e nex: 2 lir.es 

TEMP'X) = TIM? < :< ) * IS ( J ■ *A*Z ( J-l ; ! All-zera ; 

i:=wr:={J: = zirwfir ■: j - i : : 2 : - - e : : . 



10 TEMF'.K' = TZHP < K : . 3 IF.Wr 13. { 1 ; * AWZ l 2 J 

::?.wr :?. * i i - tx? 



2. dc 

T£M? :K.: =TEMP:K: -ZIRWIIRiJ) • AWF : J« 1 :• All-pcle z 

ZI?w: IR £ J) sZIRWIIR ( j-l ) ; c f zr.e : : : 

~:?.(K) =TEMP(K} -Z:?.wi:R{ I) *AW?(2 ; , Handle .*s 

Z I RW II ?. i 1 ) s Z I ?. ( K d i f f e re r. - 1 

Reoei: ~r.e docve tor z ~ e next K 



5.70 V*£ Target Vector Compuianon 

VQ TARGET VECTOR COMPUTATION (block 11) 



Input: SW. ZIR 
Output TARGET 

30 Function: Subtract the zero- input response vector from the weighted speech vector. 

Note: ZIR [K)=ZIRWHR (fD(M from block 10 above. It does not require a separate storage 
location. 



"or K=l,2 IDIM, do ch* n«t lin« 

TARGET (K) * SW(K) - ZIR(K> 



5.11 Codtbook Starch Module (block 24) 

Trie 7 blocks contained within the codebook search module (block 24) are specified below. 
Again, some blocks are described as a single block for convenience and implementation 
efficiency. Blocks 12, 14, and 15 are executed once every adaptation cycle when ICOUNT=3. 
while the other blocks are executed once every speech vector. 

IMPULSE RESPONSE VECTOR CALCULATOR (block 12) 
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w 



L-.pvit: A. AWI AW? 
Outpuc H 

Funcacn: Compute the impulse response veciorof the cascaded s>*n the sis filter and percepcual 
weighting rliter. 

This Mock is executed when ICOUNT=3 and anrr the execuuon of block 23 and 3 is completed 
(i.e.. when the new sets of A. AWZ. aWP coefficients are ready). 

TSX? li =1 . ' TE^r = synthesis filter rr.err.o 

P.C:i:=l. : RC - w ( r ) all -pole part ^er.c 

Ft: K=2 . 2 . . . . . IZ IK. do the following 
A0=G . 

a:=c . 
a: =o . 

20 ?cr IsK.K-1 2.2. do the -next 5 lines 

TEMP . I; =TEM?< 1-1) 

rcc: =rc::-I) i 

A 0 = A C- - A ( I ) 'TEMP { I ) i Filtering. 

a i r a : . awz i : : -tempi i) ; 

A2=A2 -AWP ( I ) * RC { I ) 



15 



25 



30 



temp ( : } =a: 

RC ( 1 ) =A0-A1*A2 

Repeat the above indented section £c: the next K 

ITMPsIDIM*: < Obtain h(n) by reversing 

-cr ;< = L , 2 IDIM. do the next line I the order of the aeracry of- 

H;K) =RC(ITHP-K) I all-pole section of W( 2 ) 



35 SHAPE CODEVECTOR CONVOLUTION MODULE ANT) ENERGY TABLE CALCULATOR 

(blocks 14 and IS) 

InpuL H. Y 

40 Output: Y2 

Function: Convolve each shape code vector with the impulse response obtained in block 12, 
then compute and store the energy of the resulting vcoor. 

This block is also executed when ICOUNT=3 after the execution of block 12 is completed- 

45 



50 
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Fcr J=1.2 NOu, t.-e fsllcwir.-g ; --e rcdevec::: pe: 

5 ;:=(;■:) *:3:m 

"cr -'. = 1.2 HIM. i: :r.e next 4 . ;r.es 

7 EM? t :o =c . 

"c: 1 = 1,2 K. do tr.e nex: line 

70 TIM? K) =TE^P ( K . » K { I ) 'Y'.Kl-Ij . Ccnvc.cun. 

repeat ihe above 4 lines far the next K 

v: . J» =0 . 

Fcr K = 1 . : IEIM. dc tha r.ext lir.e 

Y2^J»=Y2iJ) ♦ TEMP IK) *TEMP(K} I Ccmpuce energy* 



75 



Repeat the above for the r.ext J 



20 



40 



45 



VQ TARGET VECTOR NORMALIZATION (block 161 



Input: TARGET. GAIN 
Output TARGET 

25 Function: Normalize the VQ target vector using the predicted excitation gain. 



IMP = 1 . , GAIN 

Fcr K.sl.2 ICIM, dc the next line 

TA-F.3HTT ; K ) = TARCET(K) • TMP 

30 



TIME-REVERSED CONVOLUTION MODULE (block 13) 



35 Input H, TARGET (output from block 16) 

Output: PN 



50 



Function: Perform time-reversed convolution of the impulse response vector and the 
normalized VQ target vector (to obtain the vector /?(*)). 

Note: The vector PN can be kept in temporary storage. 

For K=l,2 IDIM, do the following 

K1=K-1 
?N(K) =0 . 

Fcr J = K, K*l IDIM. do the r.ext line 

PN(K) =PN(K) ♦TARGET ( J) •H(J-K1) 

Repeat the above for the next K 
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5 



10 



15 



ERROR CALCULATOR AND BEST CODEBOOK INDEX SELECTOR iblocks 17 and 18) 

Input: PN. Y. Y2. CB. G2. GSQ 
Output: IG. IS. [CHAN 

Function: Search through ihe gain code book, and the snape codebook to identify the bes; 
combination of gain code book index and shape code book, index, and combine the two to obtain 
the 10-bit best codebook index. 

Notes: The variable COR used below is usually kept in an accumulator, rather than storing it in 
memory. The variables IDXG and J can be kept in temporary registers, while IG and IS can be 
kept in memory. 

Ir.iCiaZ.ize CI5TX -he largest number representabie m che h4:c-are 
20 M=::c 2 

J = 1 . 2 NO"C . dc c ne f o 1 lew ^ r .g 

j; = ; j-ii •idim 

CCR=0 . 

Fcr K = l,2 ICIM. dc the next line I 

~ C R-ZO F. - r N' i K • 'Y: J1*K) I Compute inner product ?: . 

:f TCP. > v.. cher. dc the next 5 lines 

:rxo=Mi 

Fcr K.= i,2 Nl-1, do che next 'if statement 

If ZZR < GE k K) *il ( J J . dc the next 2 lines 
30 ZZXC=K I Best positive gain four.d. 

CC TO LAB Ell, 

If COR SO., then do the next 5 lines 
IZXG=NG 

For K=N1»1,N1*2 NG-i, do cne next "if statement 

If COR > GB(K)*Y2{J>, do the next 2 linee 

IDXC=K I Best negative gain found. 

GO TO LABEL. 
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:EL: D»-G2 ( IDXG ) -COR*GS£ ( IDXG ) * Y2 ( J ) 1 Compute distortion D . 

If 0 < CIS7M. do che next 3 lines 

DISTW=D I Save the lowest distortion 

IG = IDXG I and the b*st code book 

IS=J 1 indices so far. 



Repeat the above indented section for the next J 

I CHAN « (IS - 1; * NG - tie - 1) I Concatenate shape and gam 

1 code book indices. 

50 Trans mi t I CHAN thrcugh rcirmur. icac ion channel. 

For serial bit stream transmission, the most significant bit of I CHAN should be transmitted ftrsL 
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L* ICHAN is representee: by :he 10 bit word b$b t bib$b<O4b } b : b : o 0 . then the order of the 

transmuted bits shouid be e$. and then b x . and then b 7 and finally b 0 - Kb 9 is the most 

significant bit.) 



5.12 Simulated Decoder (block 3) 

Blocks 20 and 23 have been described earlier. Blocks 19. 21. and 22 are specified below. 
EXCITATION VQ CODEBOOK fblock 19) 

Input: IG. IS 
Output: YN 

Function: Perform tabie look-up to extract the best shape codeveexor and the best gain, then 
multiply them to get the quantized excitation vector. 

( is - 1 : *id:m 

X=I,2 ZDZM. do the next line 

:'N ( K) * ZQ i ZG) * Y(rJN-K) 



GAIN SCALING UNIT (block 21) 

Input: GAIN. YN 
Output; ET 

Function: multiply the quantized excitation vector by the excitation gain. 

For Ksl.2 IDIM, do the next line 

ET (K) = GAIN • YN(K) 



SYNTHESIS FILTER (block 22) 

Input; ET. A 
Output: ST 

Function: Filter the gain-scaJed excitation vector to obtain the quantized speech vector 
As explained in Section 3. this block can be omitted and the quantized speech vector can be 
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o stained as a by-produc: cf the memory update procedure to be described below. If. however, one 
wishes ;o implement ±is block anyway, a separa:e set of filter memory (rather than STATELPC) 
should be used for this aii-poi; synthesis filter 

5.13 Fitter Memory Update for Blocks 9 end 10 

The following description of the filter memory update procedures for blocks 9 and 10 assumes 
that the quantized speech vector ST is obtained as a by-product of the memory updates. To 
safeguard possible overloading of signal levels, a magnicude limiter is built into the procedure so 
that the filter memory clips at MAX and MIN, where MAX and MIN are respectively the posiuve 
and negative saturation levels of A-law or^-law PCM. depending on which law is used. 

FILTER MEMORY UPDATE (blocks 9 and 10) 

Input: ET. A. AWZ. AWR STATELPC, ZIRWFIR. ZIRWT1R 
OurpuL ST. STATELPC, ZERWFIR. ZTRWTIR 

Function: Update the liter memory of blocks 9 and 10 and also obtain the quantized speech 
vector. 
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15 



iirwfir .. i ; = e? ■ 1 1 

TIM? (1 j = £T i L ; 

For K = :.j HIM. is :r.e : = .l^ 

a d = =rr ■ k i 

Al rO . 

r:: :=K.K-1 2 , do tr.e r.ex: 

iirwt:?. ' :d =z:rvf:?. ; 

TEMPI I)=TEMr (I-1J 
AC =A0 - A ( I ) TIRWr IF.-: I 
A I = A1 * AWZ ; I ) •2IRST:?. ; I: 
A2 = A2-AWpf I) -TEMF( :: 

ZIRWTIRC 1; =A0 
TE^.P ■ 1) rAC*Al-A2 



Compute zero -state responses 
at various stages of t.-.e 
cascaded f i Iter 



20 



25 



Repeat the above indented sec: 



:r trie next 



New update filter memory by adding 
rerc-state responses to zero-input 
I responses 
For K = 1 , 2 IDIM, do the next 4 lines 

statelpc ;k) =statelpc (k: -z:rwf:r-:k; 

If 5TATELPC ( K ) > MAX. set STATELPC ( K ) =KAX ! Limit the range. 
If STATELPC ( K. ) < MIN, set STATELPC t K ) =MIN i 

z:rwi:r(K) =z:rwi:r< :<> *tim? < jo 



30 



Fcr 1 = 1.2 LPCW, do the next 1; 

z:rwf:r( i: «stat£Lpc{ i, 



I Now set 2IRWFIR to the 
I right value. 



35 



I a IDIM* 1 

For K = l,2 IDIM, do the next line 

STIK) =STATELPC( I-K) 



I Obtain quantized speech by 
I reversing order of synthesis 
I filter memory. 



5.14 Decoder (Figure 3fG.728) 

40 

The blocks in the decoder (Figure 3/G.728) are described below. Excepc for the outfit PCM 
format conversion block, all other blocks are exactly the same as the blocks in the sim u l ai wl 
decoder (block 8) in Figure 2/G.728. 

45 The decoder only uses a subset of the variables in Table 2/G.728. If a decoder and an encoder 

are to be implemented in a single DSP chip, then the decoder variables should be given different 
names to avoid overwriting the variables used m the simulated decoder block of the encoder. For 
example, to name the decoder variables, we can add a pre&x "d" to the corresponding variable 
names in Table 2X3.728. If a decoder is to be implemented as a stand-alone unit independent of 

50 an encoder, then there is no need to change the variable names 
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T>.e following description assume a sima-aione lecooer. Again, the blocks arc executed in 
L~e sarr.e order they are desenbed below. 

DECODER B ACKWARD SYNTHESIS FILTER ADAPTER (block 33) 

Ir.put: ST 
Ourpuu A 

Function: Generate synthesis filter coefficients periodically from previously decoded speech. 
The operation of this block is exactly the same as block 23 of the encoder. 



DECODER BACKWARD VECTOR GAIN ADAPTER (block 30) 

Input: ET 
Output: GAIN 

Funcnon: Generate the excitation gain from previous gain-scaled excitation vectors. 
The operation of this block is exactly the same as block 20 of the encoder. 



DECODER EXCITATION VQ CODEBOOK (block 29) 

Input: ICHAN 
Output; YN 

Function: Decode the received best code book index (channel index) to obtain the excitation 

vector. 

This block first extracts the 3-bit gain codebook index IG and the 7-bit shape codebook index IS 
from the received 10-bit channel index. Then, the rest of the operation is exactly tbe same as 

block 19 of the encoder- 
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n*n = ; 



co the nexc *:ne 



Cv;I3) • V(NN-K> 



DECODER GAIN SCAUNG UNIT (block 31) 

Input: GAIN, YN 
Output: ET 

Function: Multiply the excitation vector by the excitation gain. 
The operauon of this block is exactly the same as block 21 of the encoder. 



DECODER SYNTHESIS FILTER (block 32) 

Input: ET. A. STATELPC 
Output: ST 

Function: Filter the gain-scaled excitation vector to obtain the decoded speech vector. 

This block can be implemented as a straightforward all-pole filter. However, as mentioned in 
Section 4.3. if the encoder obtains the quantized speech as a by-produa of filter memory update 
(to save computation), and if potential accumulation of round -off error is a concern, then this 
block should compute the decoded speech in exactly the same way as in the simulated decoder 
block of the encoder. That is, the decoded speech vector should be computed as the sum of the 
zero-input response vector and the zero-state response vector of the synthesis filter. This can be 
done by the following procedure. 
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7 IMF ( K ; =: . 

- r cr JCrC. LPC- 1 . :: 

TEMP-.K. =te>:f :■: -s7.-.7H?c 

37A7ELPC < J ) =£7A7£LM . J - 1 



r.e -.ex: 2 
J) *A( J-l ; 



7HMP \ K) =TEMF !K» - STA7HT_f C -, : ) • A. i 2 ; 
= TA7£I.rC f I ) =7HM= t KJ 

5 eped: the above fcr -he r.exz K 



I Ier:*:r.pu: resrcr.se 



I Har.c 1 e Las^ cr.e 
I different:-/. 



J5 



20 



7H>(P *. 1 '. =£7 •; 1 ) 

"sr. K = 2.J IDIM. :o tr.e -.ex: 5 lines 

AC =r7{K} 



For I = K . K - : . 



i- t~* ri ex: 



7£MPt : ) =7EMP' 
ao=ao-a;u *7S>:=i :: 



I ;nes 

. Ccrcpute sere -state respcr.se 



~-Ey.p < : :• =ao 

Repeat the above 5 lines ::: tr.e next K. 



25 



30 



i Now update filter meracry by adding 
zero-state responses to rerc-ir.put 
i respenses 
r = r K = l. I7IM. d- the ~cxt I lines 

S7A7ELPC ( K ) =STA7ILrC I K ■ - TIMP f K } | ZIR » ISP. 

If S7A7SLPC : K) > MAX , set S7A7ELPC ( K ; =MAX I Limit the range. 

If 57A7ILPC {K) < MIN. set S7A7ELPC I K) sMXN I 



35 



Fcr -< = 1 . 2 IDIM. dc the next line I Obtain quantized speech by 

5T(K) =STATEL?C( I-K) | reversing order of synthesis 

I filter memory. 



10th -ORDER LPC INVERSE FILTER (block 81) 

40 This block is executed once a vector, and the output vector is written sequentially into the last 20 
samples of the LPC prediction residua] buffer (i.e. D(8I) through D(100)). We use a poimer IP to 
point to the address of D(K) array samples to be wriaen to. This pointer IP is initial ir^ to 
NTWSZ-NFRSZ^EMM before this block suns 10 process the first decoded speech vector of the 
first adaptation cycle (frame), and from there on IP is updated in the way described below. The 

45 !0ih-order LPC predictor coefficients APF(I) s are obtained in the middle of Levinsoo-Durbin 
recursion by block 50. as described in Section 4.6. It is assumed that before this block starts 
execution, the decoder synthesis filter (block 32 of Figure 3/G.728) has already wriaen the current 
decoded speech vector into ST( I ) through ST(IDIM). 

50 
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10 



15 



TMF= 0 

Fcr N = 1 . Z tz cr.e r.ex: line 

TM?=TM?-DEC !NJ 'DEC (N-J) I TMP = correlation in decimated 
If TMP > CCF.MAX, do :he next 2 lines 

CCRMAX=TMP I find maximum ::::ela:::- ar.d 

:**ilAJC=w I the corresponding lag . 

Fcr N = -M2 , -M2-2 iNPWSZ-NFRSZ; /4, do the next line 

DEC (N) =DEC fN+IDIM) I shift decimated -PC residual t-::- 

Ml=4*KMAX-2 : start correlation peak-picking .r. undecimac-^d ;:r. 

= 4 • KJ-1AX - 3 

If Ml < KPMIN. set Ml = KPMIN. I check whether Ml out of range. 
If M2 > KPMAX. sec M2 - kpmax. I check whether M2 out of ranee. 
C Z ?_MAX = most negative nunvoer of the ma chine 

Fcr J=M1,M1-1 M2 , do tne next 5 lines 

TMP=0 . 

For K=l,2 NPWSZ, do the next line 

TMF=TMP+D(K) •DlK-J) t correlation m undec imated cora m . 

If TMP > CORMAX, dc the next 2 lines 
25 CORMAX = TMP | find maximum correlation and 

K?=J I the corresponding lag. 

Ml = >:?! - K? DELTA I determine che range of search ir:^ 

M2 r k?i * KP DELTA I the pitch period of previous : r asr.e 



20 



30 



35 



40 



If K? < M2*l, go to LABEL. I KP can't be a multiple pitch 

if Ml < KPMIN, set Ml = KPMIN. I check whether Ml out of range. 



A. — — ~ 



CMAX = most negative nuniber of the machine 

Fcr J=M1,M1 + 1 M2, do the next 6 lines 

TMP=0 . 

Fcr K=l,2 NPWS Z , do the next line 

TMP=TMP+D (K) »D ( K- J) I correlation in ur.de c imated doma ' 

If TMP > CMAX , do the next 2 lines 

CMAX=TMP ! find maximum correlation and 

KPTMP=J I the corresponding lag. 

SUM=0 . 

TMP^C. I start computing the tap weights 

For K=l,2 NPWSZ, do the next 2 lines 

SUM = SUM ♦ D(K-KP) *D(K-KP) 
45 TMP = TMP ♦ D(K-KPTMP) *D(K-KPTMP) 

If SUM=0, set TAP=0; otherwise, set TAP=CORMAX/SUM. 

If TMP=0, set TAP1=0; otherwise, set TAP1=CMAX/TM? . 

If TAP > 1, set TAP =1. I clamp TAP between 0 and 1 

If TAP < 0. set TAP = 0 . 

If TA?1 > 1. set TAP1 =1. ! clamp TAF1 between C and 1 
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10 



15 



20 



25 



30 



35 



40 



45 



Irpu:: ST. APF 
Ourput; D 

Funcucn: Compute 'Jit LPC pre diction residual for the current decoded speech vector. 

It I.- = -r.er. se: IF - NPWS3 - NFP.S2 I check & -pc<-- e : = 

"cr :<=1,2 "IM. c: :r.e -exc " lines 



. : c: - = IC . j ..... i . 2 . do :ne next 2 lines 

: ITS!?; = ?::TM?i - 3TLPC I ( J ' * AFF ( J* 1 ) ! FIR filtering. 

STl?c:>J: = 3TL?c::J-1- i Memory snift. 

- ITy? = : • : T>! P ) * 3TLPCI l? *APF£2> : Handle las- ~r.<? . 

iTl?c: 1. = ST Ki i shif- ;r. inpu: . 

IF = I? - I~IM I updace IP. 



PITCH PERIOD EXTRACTION MODULE (block 82) 

This block is executed once a frame ai the third vector of each frame, after the third decoded 
speech vector is generated. 

Input: D 
OucpuL KP 

Function: Extract the pitch period from the LPC prediction residual 

If ICCOJT * 3. skip che execution of this block; 
Otherwise, do che following. 

I lowpass filtering & 4:1 down samp line; . 
For KsNFWSZ-NF R£Z~1 NPWSZ, do che next 7 lines 

TOP=D ( K) -STLPF ( 1 ) * AL ( 1 ) - STLPF ( 2 ) *AL { 2 ) -STLPF (3 ) *AL ( 3 ) I IIR filter 
If K is divisible by 4, do the next 2 lines 

N=K/4 I do FIR filtering only if needed. 

DEC JN) =TM?»SL(1) «>STLPF< 1) *BL(2 J *STLPF(2 ) *BL ( 3 ) +STLPF ( 3 ) *SL(4 > 
STLPF ( 3 ) =STL?F<2) 

STLPF ( 2 } = STLPF { 1 ) I shift lowpass filter memory . 

STLPF ( 1 / =TMP 

Ml = KPMIN/4 t start correlation peak-picking .r. 

H2 - KPMAX/4 i the decimated LPC residual dcr.a:-. 

CCRMAX = rr.csc r.ec/ative nciDe: of the machine 
Fcr J = M1.M"_-1 M2 . do -ne r.ex: 6 lines 
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:a?i < 



A?, > 7AF7K 



Replace Kr 



furvda-T.er.ral 



; TAF1 is large enough . 
:r.er. =e: KF = KPTMF . 



: K ? 1 = K P 
For K=-K?MA*~1 , - K 
D • K / = D ( K •Nrr. 



I update pitch pence of pre-/-. 

NFWSZ-NFRSZ, ic tiie r.ex: line 

iZJ I shift the L?C residual buffer 



PITCH PREDICTOR TAP CALCULATOR (block S3) 

Thus block is also executed once a frame at the third vector of each frame, right after the execution 
of block 82. This block shares the decoded speech buffer (ST(K) array) with the long-term 
posthlter 71. which takes care of the shifting of the array such that ST(1) through ST(EDIM) 
constitute the current vector of decoded speech, and ST(-K>MAX-NPWSZ+1) through ST(0) are 
previous vectors of decoded speech- 



Input: ST. KP 
Ourput: PTAP 

Funcuon: Calculate the optimal :ap weighi of the single-tap pitch predictor of the decoded 
speech. 



If I COUNT * 3, skip Che execution of this block; 
Ccherwise, do the following. 

7MP = 0 . 

"or K=-NPWSZ-1, -NPWSZ-2 0. do the next 2 lines 

SUM = SUM - ST(K-KP) *STtK-KP) 

TMP = TMP ♦ STCK) *ST(K-K?) 
If SUM=0, set PTA?=0; otherwise, set PTAP = IMF/ SUM . 



LONG-TERM POSTFTLTER COEFFICIENT CALCULATOR (block S4) 

40 

This block is also executed once a frame ax the third vector of each frame, right after the execution 
of block 83. 

45 Input: PTAP 

Output B.GL 

Function: Calculate the coefficient b and the scaling factor & of the long-term postfilter. 

50 
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10 



ICOtJNT * : . s<-p t ne e :< e: .; .:: 
'.erv i s e . do ~ r.e f c i 1 = -* : - - 
If FTAr > 1, sec PTAF = 1. 
:f r A? < ??rTK. se: PTAr = :-. 



"is :.:c<; 

-lamp ?TA? at I. 

turn cit pitch post filter i i 

PTAP sailer than threshold. 



: rr: r 

= i / -: i 



PTA? 



15 



SHORT-TERM POSTF1LTER COEFFICIENT CALCULATOR (block &5) 

This block is aLso executed once a frame, but u is executed ai the first vector of each frame. 



Lnput: APF. RCTMPt I ) 
2Q Output; AP. AZ. TTLTZ 

Function: Calculate the coefficients of the short -term postflter. 



25 



30 



35 



40 



45 



If ;CCCNT * 1. skip the execution zi this block; 
Otherwise, dc the fci lowing. 

Fcr 1=2.3 LI. do the next 2 lir.es I 

AP < I } =S?F?cr/ ( I J * AFF ■: I . | scale denominator cceff. 

A2 ( I ) =SFFZCFV{ I ) *AFF( I ; I scale numerator coeff. 

TI-TZsTILTF-F.CTMPC; t tilt compensation filter ::e 



LONG-TERM POSTRLTER ("block 71) 
This block is executed once a vector. 

Input: ST. B.GL. KP 
Output: TEMP 

Function: Perform filtering operation of the long-term postfilter. 



For K = l,2 ISM. do the next line 

TQ1P(K) =GL* ( ST ( K) ♦B*ST(K-K?) ) 



I long-term post f i Iter ing . 



For K=-NPWSZ-K?MAX* 1 -2,-1.0. do the next line 

ST(K) •STdUIDIM) I shift decoded speech buffer. 
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SHORT-TERM POSTFILTER (block 72) 
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5 



10 



20 



25 



30 



"Hus block is executec once a vector ngnt after execution of biock 71 . 

Input: AP. AZ, TtLTZ. STPFFER. STPFTIR. TEMP (output of block 7 1) 
Output: TEN IF 

Function: Perform rUtenng operator, of Lie short-term postfilter. 



For K = i . 2 :c:m. dc the fellow:^ 

TMF = TEMP ! K) 

For J=1C. 9 2.2. dc che next 2 lines 

'5 TEM?;:<) = TEMP-.K) * STPFFIR ( J J • AZ { J* 1 ) | All-zero -Art 

ST?FF:r:J) = STPFFIR .- -l) j of the filter 

THXP:K) = T™.?(KJ - STPFFIR ; 1 ) * AZ ( 2 ) 
STPFFIR ■:) = TMP 



udst mu 1 1 ip 1 1 * r 



For J = 10, 9. ....3.2, do cne next 2 lines 

TZKP<K) = TMMKi - STFFIIR(J) 'AP(J^l) [ All-pole part 

STPFIIR(J) = STPFIIR! J-l) | of the filter. 

TEHF(K) = TEMP(K) - ST? Fl IR < 1 J • AP ( 2 ) ! Last multiplier. 
STFFI IP. i 1) = 7H>.F < K) 

T£MF(K) = TSMP(K) . STPF;:?.;2) *TILTZ | Spectral tilt ccm- 

I pensdt::n filter. 



SUM OF ABSOLUTE VALUE CALCULATOR (block 73) 
This block is executed once a vector after execution of block 32. 



Input; ST 

35 Ourput SUMUNFIL 

Function: Calculate the sum of absolute values of the components of the decoded speech 
vector. 

40 SUMUNFIL* 0 . 

FOR K = l,2 IDIM. do Che next line 

SUMUNFIL = SUMUNFIL * absolute value of ST(K) 



45 SUM OF ABSOLUTE VALUE CALCULATOR (block 74) 

This biock is executed once a vector after execution of block 72. 
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w 



15 



20 



25 



30 



35 



40 



Inpu:: TEMP (output cf block "2) 
Output; SUMFIL 

Function: Calculate the sum of absolute values of the component of the short-term postrilter 
output vector. 

FOR K-l.2 IDIM. do zr.e next line 

SUMFIL = S'JMFIL * dbsciu:e value oi TD1P(K) 

SCALING FACTOR CALCULATOR (block 75) 
This block is executed once a vector after execution of blocks 73 and 74. 

Input: SUMUNFTL. SUMFDL 
Output; SCALE 

Function: Calculate the overall scaling faciorof the postfilter 

If S'JMFIL > 1. sec SCALE = a'JMUNFIL / SUMFIL; 
Orr.erwise. sec 3CALE = 1. 

FIRST-ORDER LOWPA^S FILTER (block 76) and OUTPUT GAIN SCALING UNIT (block 77) 

These two blocks are executed once a vector after execution of blocks 72 and 75. U is more 
convenient to describe the two blocks together. 

Input; SCALE, TEMP (output of block 72) 
Output: SPF 

Function; Lov mss filter the once-a-vecior scaling factor and use the filtered scaling factor to 
scale the short-term postfilter output vector. 



For K=i , 2 IDEM, do th« Collovinc; 

SCALEFIL = AGCFAC* SCALEFIL - tl - AGCFAC ) 'SCALE \ lovpass filtering 
45 S?F(K) = SCALEFIL-TEMF ;K) < seal* output . 



OUTPUT PCM FORMAT CONVERSION (block IS) 
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Input: SPF 
Output: SD 

Function: Convert the 5 components of the decoded speech vector into 5 corresponding A-la 
cru-iaw PCM samples and put them out sequentially ac 125 us time intervals. 

The conversion odes from uniform PCM to A-law or n-law PCM are specified 
Recommendation G.71 1. 
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ANNEX A 
uc Recommendation G.T.S) 

HYBRID WINDOW FUNCTIONS FOR VARIOUS LPC ANALYSES IN LD-CELP 



In the LD-CELP coder, we use three separate LPC analyses to update the coefficients of three 
rllters: (l) the synthesis filter. (2) the log-gain predictor, and (3) the perceptual weighting filter. 
Each of these three LPC analyses has its own hybrid window. For each hybrid window, we list the 
values of window function samples thai are used in the hybrid windowing calculation procedure. 
These window functions were first designed using floaong-point arithmetic and then quantized to 
the numbers which can be exactly represented by 16-bit representations with 15 bits of fraction. 
For each window, we will first give a table containing the floating-point equivalent of the 16-bit 
numbers and then give a table with corresponding 16-bit integer representations. 

A.l Hybrid Window for the Synthesis Filter 

The following table contains the first 105 samples of the window function for the synthesis 
filler. The first 35 samples are the non-recursive portion, and the rest are the recursive portion. 
The table should be read from left to right from the first row. then left to nght for the second row. 
and so on (just like the raster scan line). 



C.04776001C 
0.2S2775879 
0.501739502 
0.692199707 
0.843322754 
0.946533203 
0.996002197 
0.988861084 
0.953948975 
0.920227051 
0.887725830 
0.856384277 
0.826141357 
0.796936035 
0.768798828 
0.741638184 
0.715454102 
0.69O185547 
0.6658O2OO2 
0.642272949 
0.619598389 



0.095428467 
0.328277588 
0J4 2480469 
0.725891113 
0.868O4 1992 
0.960876465 
0.999114990 
0.981781006 
0.947082520 
0.913635254 
0.881378174 
0.850250244 
0.820220947 
0.791229248 
0.763305664 
0.736328125 
0.710327148 
0.685241699 
0.661041260 
0.637695313 
0.615142822 



0.142852783 

0.373016357 

0.582000732 

0.757904053 

0.890747070 

0.973022461 

0.999969482 

6.974731445" 

0.940307617 

0.907104492 

0.875061035 

0.844146729 

0.814331055 

0.785583496 

0.757812500 

0.731048584 

0.705230713 

0.680328369 

0.656280518 

0.633117676 

0.610748291 



0.189971924 
0.416900635 
0.620178223 
0.788208008 
0.911437988 
0.982910156 
0.998565674 
0.967742920 
0.933563232 
0.900604248 
0.868774414 
0.838104248 
0.808502197 
0.779937744 
0.752380371 
0.72583OO78 
0.700164795 
0.675445557 
0.651580811 
0.628570557 
0.606384277 



0.236663818 
0.459838867 
0.656921387 
0.816680908 
0.930053711 
0.990600586 
0.994842529 
0.960815430 
0.926879883 
0.894134521 
0.862548828 
0.832092285 
0.802703857 
0.774353O27 
0.747009277 
0.720611572 
0.695159912 
0.670593262 
0.646911621 
0.624084473 
0.602020264 
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The next tabic contains the corresponding !6-cu integer representation. Dividing the table entries 
by : i5 = 32*68 gives the table above. 
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17776 




*>n777 












*>A7* ! 

/ U i 








7QRAA 
."QOu 


7A47rS 


J IUIU 








77dATi 




J-C I J 7 


777A7 

Jim lQ f 


7777 1 


77^00 




J<- 1 / 1 


7 1 am 


7 t 7 1 1 
J 1 / 1 I 


*5 1 4 Q/4 


7 1 7<Q 




ins i 7 




-HJ j / £ 


7f)! ^4 


-77JO 


7Q774 


*>Q^ 1 T 

-7J 1 1 








2S574 


25<*o8 


28264 


28062 


27861 


27661 


27463 


27266 


27071 


26877 


26684 


26493 


26303 


26114 


15921 


25742 


25557 


25374 


25192 


23012 


24832 


24654 


24478 


24302 


24128 


23955 


23784 


23613 


23444 


23276 


23109 


22943 


22779 


22616 


22454 


22293 


22133 


21974 


21817 


21661 


21505 


21351 


21198 


21046 


20896 


20746 


20597 


20450 


20303 


20157 


20013 


19870 


19727 



A -2 Hybrid Window for th« Log-Gain Predictor 

The following (able contains the first 34 samples of the window function for the log -gain 
predictor. The first 20 samples are the non-recursive portion, and the rest are the recursive 
portion. The table should be read in the same manner as the two tables above. 



0.092346191 
0.526763916 
0.850585938 
0.995819092 
0.932006836 
0.778625488 
0.650482178 



0.183868408 
0.602996826 
0.895507813 
0.999969482 
0.899O78369 
0J5 1129 150 
0.627502441 



0.273834229 
0.674072266 
0.932769775 
0.995635986 
0.867309570 
0.724578857 
0.605346680 



0.361480713 
0.739379883 
0.962066650 
0.982757568 
0.836669922 
0.699005127 
0.583953857 



0.446014404 
0.798400879 
0.983154297 
0.961486816 
0.807128906 
0.674316406 



The next table contains the corresponding 16-bit integer representation. Dividing the table 
entries by 2 13 = 32768 gives the table above. 
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3026 
I726I 
27S72 
32631 
30540 
255 14 
21315 



6025 
19759 
29J44 
32767 
29461 
24613 
20562 



3973 
22088 
30565 
32625 
28420 
23743 
19836 



11845 
24228 
31525 
32203 
27416 
22905 
19135 



14615 
26162 
32216 
31506 
26448 
22096 



A.3 Hybrid Window for the Perceptual Weighting Filter 

The follow-jig table contains the first 60 samples of the window function for the perceptual 
weighting niter. The first 20 samples are the non-recursive portion, and the rest are the recursive 
portion. The tacie sncuiu be rsad in the same manner as the four tables above. 



0.059722900 
0.3510131S4 
0.611145020 
0.817108154 
0.950622559 
0.999847412 
0.960449219 
0.880737305 
0.807647705 
0.740600586 
0.679138184 
0.622772217 



0.119262695 
0.406311035 
0.657348633 
0.850097656 
0.967468262 
0.999084473 
0.943939209 
0.865600586 
0.793762207 
0.727874756 
0.667480469 
0.612091064 



0.178375244 
0.460174561 
0.701171875 
0.880035400 
0.980865479 
0.994720459 
0.927734375 
0.850738525 
0.780120850 
0.715393066 
0.656005859 
0.601562500 



0.2368164O6 
0.512390137 
0.742523193 
0.906829834 
0.990722656 
0.986816406 
0.911804199 
0.836120605 
0.766723633 
0.703094482 
0.644744873 
0.591217041 



.0-294433594 
0.562774658 
0.781219482 
0.9303894O4 
0.997070313 
0.975372314 
0.896148682 
0.821746826 
0.753570557 
0.691009521 
0.633666992 
0.58 1085205 



The next table contains the corresponding 16-bit integer representation. Dividing the table 
entries by 2' 3 = ]2768 gives the table above. 



1957 
11502 
20026 
26775 
31150 
32763 
31472 
28860 
26465 
24268 
22254 
20407 



3908 
13314 
21540 
27856 
31702 
32738 
30931 
28364 
26010 
23851 
21872 
20057 



5845 
15079 
22976 
28837 
32141 
32595 
30400 
27877 
25563 
23442 
21496 
19712 



7760 
16790 
24331 
29715 
32464 
32336 
29878 
27398 
25124 
23039 
21127 
19373 



9648 
18441 
25599 
30487 
32672 
31961 
29365 
26927 
24693 
22643 
20764 
19041 
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ANNEX B 
(to Recommendation G.728) 

EXCITATION SHAPE AND GAIN CODEBOOK TABLES 



This appendix first gives the 7-bit excitation VQ shape codebook table. Each row in the table 
specifies one of the 128 shape codevectors. The first column is the channel index associated with 
each shape codevector (obtained by a Gray-code index assignment algorithm). The second 
through the sixth columns are the first through the fifth components of the 128 shape code vectors 
as represented in 16-bit fixed point. To obtain the floating point value from the integer value, 
divide the integer value by 2048. This is equivalent to multiplication by 2~ {l or shifting the binary 
poini 1 1 bits to the left 



Channel 












tnoex 






f* jm ru~u~L^ n r c 
IwCXHpCHiCIllo 






u 


OOO 




* liJ** 


.1790 

* I / TV 


-2553 


i 




— ♦ J / / 












'AO 1 / 






-4450 


3 


-6679 


-340 


1482 


•1276 


1262 


4 


-562 


-6757 


1281 


179 


-1274 


5 


-2512 


-7130 


-4925 


6913 


2411 


6 


-2478 


-156 


4683 


-3873 


0 


7 


-8208 


2140 


-»78 


-2785 


533 


8 


1889 


2759 


1381 


-6955 


-5913 


9 


5082 


-2460 


-5778 
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-201 1 


•97! 3 
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-5620 
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-5934 


2131 


1743 


-2006 


3342 


-1583 


-1831 


6397 


-6528 


5309 


748 


1935 


5366 


3193 


-370 


1866 


-2690 


r2577 


2235 


-1850 


3880 


-2465 


2829 


5588 


-1918 


5955 


3908 


5798 


5444 


-2570 


-2086 


3532 


950 


4980 


3502 


1719 


263 


2114 


-1208 


9347 


-439 


8028 
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2008 
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2759 


1850 


7361 


-5768 
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194 
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816 
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30 


-1370 


-246 


-2627 


3170 


-1062 


799 


14937 


10706 


-5057 


-1153 


4285 


666 


-2119 


-1697 


110 


2136 


-3500 


-1855 


-558 


1709 


-454 


-2957 
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69 


-2839 


70 


-189 


71 


-2842 


72 


1517 


73 


1913 


74 


-2903 


75 


-2913 


76 


1844 


77 


467 


78 


-127 


79 


873 


80 


2311 


81 


641 


82 


-45 


83 


-2004 


84 


2936 


85 


2827 


86 


3199 


87 


2948 


88 


4286 


89 


3903 


90 


-606 


91 


-525 


92 


4297 


93 


5765 


94 


2735 


95 


4033 


96 


74 


97 


-2496 


98 


-2168 


99 


-3552 


100 


-2613 


101 


-1747 


102 
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103 


-1684 


104 


2707 
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2517 


106 


-148 
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-527 
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no 


2574 
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-;666 


-273 


-2376 


1663 


- 1 369 


636 


79 


-3013 


-2493 


-5312 


-3324 


-3756 


-1547 


-2760 


-1834 


456 


-4256 


-1909 
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-637 


-2045 


-3828 


-I817 


2632 


1 194 


1893 


1198 


2160 


1713 


3518 


-3968 


1280 


8 


-1928 


-816 


2687 


4029 


394 


51 


^»507 


5646 


-5588 


1234 


-1607 


3620 


-2192 


-3251 


-2283 


528 


-3287 


1241 
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1648 


-2965 


918 


1999 


-1605 


2034 


2037 


15 


1530 


581 


-2338 


3621 


81 


5538 


867 


214 


2816 


-229 


504 


479 


-1487 


-1596 


2206 


^t288 


1243 


-2731 


-1501 


3688 


-3369 


1875 


2513 


1449 


1826 


-2497 


•220 


3418 



20S4 
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-1040 


•2449 


-248 


-2677 


-3669 


-973 


-749 


1271 


-3690 


-1829 


-1406 


1124 


706 


-4272 


1521 


1134 


-1491 


-6494 


-2792 


-578 


-3052 


1968 


4107 


6342 


-1449 


2203 


2652 


4251 
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-1476 


2658 


3513 


-1741 


-1407 


-253 


1298 


-32 


-659 


-2592 


5707 


-5187 


664 


-2527 


1707 


812 


-2264 


1352 


1672 


-3273 


-3407 


-1174 


1444 


915 


-1026 


2950 


229 


-1264 


-208 


1491 


962 


-1488 


-2183 


1432 


-2257 


-2284 


-1510 


2551 


-1389 


2783 


-1009 
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1929 


1292 


-1401 


1909 


1280 


610 


-*39l 


3636 


-1217 


-3074 


-4979 


4234 


-1077 
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1115 
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I li 
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•758 


-2455 
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85 
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-1466 
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3852 


1579 


-77 


2064 


868 


122 


5109 


2919 


-202 


359 


-509 


123 


3650 


3206 


2303 


1693 


1296 


124 


2905 


-3907 


229 
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-2332 
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5977 


-3585 


305 


3825 


-3138 
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3746 


^506 


53 


-269 


-3301 


127 


606 


20 1 S 


-1316 


4064 
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Next we give the values for the gain codebootc This table not only includes the values for GQ. 
but also the values for GB. G2 and GSQ as well. Both GQ and GB can be represented exactly in 
16-bit arithmetic using Q 1 3 format. The fixed point representation of G2 is just the same as GQ. 
except the formal is now Q12. An approximate representation of GSQ to the nearest integer in 
fixed point Q12 format will suffice. 



Array j 
Index I 




r 


4 


5 


6 


* ! * ; 


GQ 


| 0.515625 


0.90234375 


1.579101563 


2.763427734 


-GQ(D 


-GQ(2) 


-GQO"/ 


-GO.- • t 


GB 


■ 0.708984375 


1.240722656 


2.171264649 


• 


-GB(l) 


-GB(2) 


«GB(3i 




G2 


I 1.03125 


1.8046875 


3.158203126 


5.526855468 


-G2U) 


-G2(2) 


-C2(3) 




GSQ 


1 0.26586914 


0.814224243 


2.493561746 


7.636532S41 


GSQ(l) 


GSQ(2) 


GSQ<3) 


GSQU<, j 



■ Can be any arbitrary value (not used). 

• • Note that GQ( I ) = 33/64, and GQ<iM7/4)GQ<i- 1 ) for i=2 J.4. 



Values of Gain CodebooW Related Arrays 
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AN VEX C 
(lo Recommendation G.72S) 

VALUES USED FOR BANDWIDTH BROADENING 

The following table gives the integer values for the pole control, zero control and bandwidth 
broadening vectors lisied in Table 2. To obtain the floaang poinr value, divide the integer value 
by 163S4. The values in this table represent these floating point values in the Q 14 format, the 
most commonly used format to represent numbers less than 2 in 16 bit fixed point arithmetic. 





i 


FACV 


FACGPV 


WPCFV 


WZCFV 


SPFPCFV 


SPFZCFV 


15 


I 


16384 


16384 


16384 


16384 


16384 


16384 




16192 


14848 


9830 


14746 


12288 


10650 




3 


16002 


13456 


5898 


13271 


9216 


6922 




4 


15815 


12195 


3539 


11944 


6912 


4499 




5 


15629 


11051 


2123 


10750 


5184 


2925 


20 


6 


15446 


10015 


1274 


9675 


3888 


1901 




t 


15265 


9076 


764 


8707 


2916 


1236 




■ 8 


15086 


8225 


459 


7836 


2187 


803 




9 


14910 


7454 


275 


7053 


1640 


522 


25 


10 


14735 


6755 


165 


6347 


1230 


339 


11 
12 
13 
14 


14562 
14391 
14223 
14056 


6122 


99 


5713 


923 


221 


30 


15 
16 
17 
18 


13891 
13729 
13568 
13409 












35 


19 
20 
21 
22 


13252 
13096 
12943 
12791 












40 


23 


12641 












24 

25 
26 
27 


12493 
12347 
12202 
12059 












45 


28 
29 
30 
31 


11918 
11778 
11640 
11504 












50 


32 
33 


11369 
1 1236 
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3- 1 1 lCu 

35 I0974 

36 I0S45 

37 !07lS 

38 10593 

39 1W6S 

40 10346 
•it 1 0225 

42 10105 

43 9986 

4- * 9869 

45 9754 

46 9639 

47 9526 

48 9415 

49 9304 

50 9195 

51 9088 



ANNEX D 
(to Recommendation G.728) 

COEFFICIENTS OF THE 1 kHz LOWPASS ELLIPTIC FILTER 
USED IN PITCH PERIOD EXTRACTION MODULE (BLOCK 82) 



The 1 kHz lowpass filler used in the pitch lag extraction and encoding module (block 82) 
third-order pole-zero filter with a transfer function of 

L(:)= j . 

1 ♦ 

where the coefficients o,'s and A.'s arc given in the following tables. 



i 


a, 


A. 


0 




0.0357081667 


1 


-2.34036589 


-0.0069956244 


2 


2.01190019 


-0.0069956244 


3 


-O.614l092t8 


0.0357081667 
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ANNEX E 
uo Recommendauon C.728) 

TIME SCHEDULING THE SEQUENCE OF COMPUTATIONS 

Ail of the computauon in the encoder and decoder can be divided up into two classes 
Included in the first class are those computations which take place once per vector. Sections 5 
through 5.14 note which computations these are. Generally they are the ones which involve or 
lead to the actual quantization of the excitation signal and the synthesis of the output signal. 
Referring specifically to the block numbers in Fig. 2. this class includes blocks i. 2, 4. 9. 10. U. 
13, 16, 17. 18, 21. and 22. In Fig. 3, this class includes blocks 28. 29. 31, 32 and 34. In Fig. 6. 
this class includes blocks 39, 40, 41, 42. 46. 47, 48. and 67. (Note thai Fig. 6 is applicable to both 
block 20 in Fig. 2 and block 30 in Fig. 3. Blocks 43. 44 and 45 of Fig. 6 are not part of this class. 
Thus, blocks 20 and 30 are pan of both classes.) 

In the other class are those computations which are only done once for every four vectors. 
Once more referring to Figures 2 through 8. this class includes blocks 3, 12, 14, 15. 23. 33. 35. 36. 
37, 38, 43, 44, 45. 49, 50. 51, 81. 82. 83, 84. and 85. All of the computations in this second class 
are associated with updating one or more of the adaptive filters or predictors in the coder. In the 
encoder there are three such adaptive structures, the 50th order LPC synthesis filter, the vector 
gam predictor, and the perceptual weighting filter. In the decoder there are four such structures, the 
synthesis filter, the gain predictor, and the long term and short term adaptive post£lters. Included 
in the descriptions of sections 3 through 5.14 are the times and input signals for each of these five 
adaptive structures. Although it is redundant, this appendix explicidy lists all of this timing 
information in one place for the convenience of the reader. The following table summarizes the 
five adaptive structures, their input signals, their times of computation and the time at which the 
updaLed values are first used For reference, the fourth column in the table refers to the block 
numbers used in the figures and in sections 3. 4 and 5 as across reference to these computations. 

By far, the largest amount of computation is expended in updating the 50th order synthesis 
filter/ The input signal required is the synthesis filter output speech (ST). As soon as the fourth 
vector in the previous cycle has been decoded, the hybrid window method for computing the 
autocorrelation coefficients can commence (block 49). When it is completed, DuibuVs recursion 
to obtain the prediction coefficients can begin (block 50). In practice we found it necessary to 
stretch this computation over more than one vector cycle. We begin the hybrid window 
computation before vector 1 has been fully received. Before Duibin's recursion can be fully 
completed, we must interrupt it to encode vector I. DurbuVs recursion is not completed until 
vector 2, Finally bandwidth expansion (block 51) is applied to the predictor coefficients. The 
results of this calculation are not used until the encoding or decoding of vector 3 because in the 
encoder we need to combine these updated values with the update of the perceptual weighting 
filter and code vector energies. These updates are not available until vector 3. 

The gain adaptation precedes in two fashions. The adaptive predictor is updated once every 
four vectors. However, the adaptive predictor produces a new gain value once per vector. In this 
section we are describing the timing of the update of the predictor. To compute this requires first 
performing the hybrid window method on the previous log gains (block 43), then Duibin's 
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Timing of Adapter Updates 
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recursion (bloc* 44). and bandwidth expansion (block 45). All of this can be completed during 
vector 2 using the log gains available up through vector I . If the result of Durbin's recursion 
mdicaies there is no singularity, then the new gain predictor is used immediately in the encoding 
of vector 2. 

The perceptual weighting filter update is computed during vector 3. Trie first pan of this 
update is performing the LPC analysis on the input speech up through vector 2. We can begin this 
computation immediately after vector 2 has been encoded, not waiting for vector 3 to be fully 
received. This consists of performing the hybrid window method (block 36). Durbin's recursion 
(block 37) and the weighting filter coefficient calculations (block 38). Next we need to combine 
the perceptual weighting filter with the updated synthesis filter to compute the impulse response 
vector calciuator (block 12). We also must convolve every shape codevector with this impulse 
response 10 rind the codevector energies (blocks 14 and 15). As soon as these computations are 
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completed. we can immediately use aJJ of tte updated values m the encoding or vecior 3. '.Note; 
Because :he compulation of codevector energies is fairly intensive, we were unaole to complete 
the perceptual weighting filter update as pan of the computation during the time of vector 2. even 
5 if the gain predictor update were moved elsewhere. This is why it was deferred to vector 3.) 

The long term adaptive posifilter is updated on the basis of a fast pitch extraction algorithm 
which uses the synthesis filter output speech (ST) for its input. Since the posifilter is oniy used in 
the decoder, scheduling time to perform this coraputauon was based on the other computational 

10 loads in the decoder. The decoder does not h2ve to update the perceptual weighting filter and 
codevector energies, so the time slot of vector 3 is available. The codeword for vector 3 is 
decoded and its synthesis filter output speech is available together wtth all previous synthesis 
output vectors. These are input to the adapter which then produces the new pitch penod (blocks 
81 and 82) and long-term posifilter coefficient (blocks 83 and 84). These new values are 

15 immediately used in calculating the postfiltered output for vector 3. 

The short term adaptive posuilter is updated as a by-product of the synthesis filter update. 
Durbins recursion is stopped at order 10 and the prediction coefficients are saved for the posifilter 
update. Since the Durbin computation is usually begun during vector 1. the short terra adaptive 
20 posifilter update is completed in time for the postfUtenng of output vector 1. 
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Figure l/G/728 Simplified Block Diagram of LD-CELP Coder 
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Figure 4(bVG.728 Illuscrauoo of a hybrid window 



25 



30 



35 



40 



45 



50 



55 



78 



EP 0 673 018 A2 



10 



I 

V 

Hyond 

Module 



75 



20 



25 



JO 



I 



Lovnsoo- 
Durbm 

Recia-woo 
Moduie 



51 



Eip*row» 
Module 



Syobest* 
Rh«r 
Coeffic 



35 



Figure 5/G.72S Backward Synthesis Filter Adapter 
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Figure 6/C.728 Backward Vector Gain Adapter 
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Figure 7/G.728 Poscfilter Block Schemaric 
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.APPENDIX l 
(to Recommendation G.~28) 



IMPLEMENTATION VERIFICATION 



A set of verification cools have been designed in order to facilitate the compliance verification 
50 of different implementations to the algorithm defined in this Recommendation. These verification 
tools are avadable from the ITU on a set of distribution diskeaes. 
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Iac4escauaoQ *tnfkicoa 



This Ajrpervlu aescnbes wne dig:aJ tesi sequences ir.c rneirore.T.er.i icri<*-arr v tc 
"ci^oo. These *erJicaaon tools are ivuUbie irotn tfie (TV on a sc: of *en.ncauon i;j*£t;£i. 



i fcr ixr.c.T.f.-^jc." 



15 



70 / / '•t'.f.zaftc* pr'jic&U 

The LO-CLLP algorithm specification is formulaied s» i non-btwxaa manner to jiio* for svmpie imoieme-uuo r . 
cn i.:f*— nt "cndj of harcf*ve. This implies tnat tf*e veruScaoon procedure can not uume ;he :rr.ptemenuQon jnaer 
to x ttacuy equal la arry reference implementation. Hence, objeccve T wm i r cmeTm arc needed to tsuciun ic^ree ct 
:evu=cn ber*een xsz and reference. Lf (Jus measured devotion is found to be ttiffVciendy imiil. :ne xc impic^fnuuon 
i s Assumed us be mterooeraCte «ntfi any other implementation passng -h« best Since no finite length xu o dearie ct 
tescn; -very j of an impierneniioon. 100% cerumry cui aa unpiemenaaon u correct can never be guararxed. Ho- 
*ev?. tAe lcsi procedure ^lescribed exerci^s all main parj of 'Jve LDOLLP algontnm and inouid be a vaiuaok -x>J rcr 
Lie -np*emenior. 

The venrcaoon procedures described ifl inu appendix have been designed with 32 bit floating-point :rr.piemer.u- 
Lons ;n mmd. Aiinouap tftey oouid be applied c any LD-CELP implementation. 32 bu floaorf -pouu formal ?rocacH> 
20 be needed © fulfill me test requirement!. VenAcaooo procedures that could perm a a fixed-point aigon^vn to be -eaniea 

are cvjrenUy urWr sojtfy. 



I 2 Test CDAflgwaGAxj 

This sec -on ■ >*r how r -he lifTereea ex *equences and measurement programs tiouid be vised logtLte: _d 
25 pcrtarn ine ver*ficaaoo The procedure J baaed on biicx-bai tesang at ff* interfaces SU and ICHA.N of tne 

encocer and ICHAS and SPF of the m decoder. The agists SU and SPf are represented ut 16 bits fixed point preca.cn 
as ^icr.Ded to Sec son L4_L a possabtiiry o an off the adapcjve pocfiltcr snould be provided in ffve xsxed decoaer ;m- 
pierr.tr. uooa. Ail tea y-q v— ""^ irv g should be saned %-uh die tesz unpiemenuaon in ine juual reset suie is 
ned by cne LD-CELP rtcorAmettfaoan. Three measuremem ^pmi CWCOMP. SNR and ^'SNR. are n eeded x 
fern '.v test ^i n ^m m~ Yt***+ evaiuaCKm. Tbe« piufjmiu arc ftjrtaJ daaenbed in Secuon I_3. Descnpocnj of jv 
30 ^rfereni test axifiguraooea o be aae4 are foood tie foiao^vtc aabatcsons (IX 1-1.2.4). 
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This lecaon <1rr-.3c rro^nrrLi C"3»'COM?. SS"R and *'SNR. rzitrrzz ^ :" 4 _--or ^ _ 

*ci: ^ -jv- p^.^^ LDOEC provioea u an on pi em enters debugguig too*. 

The v^riiciaon icrr*ir: j vnucn w Fortran and is itep* as close us '.ne a>'SI Fcmi *~ star.cir* is posst;:; 
3ouCle rroc^jon floaang pcir.t ."laoiuuofl a oseC ci'-enii vcjy us minimize turner**; era »n me r-rertner L~'-CEi_? rro- 
J0 -uies. T>wc prcgnmi .m*» seen compiled with a cocrmercaiJy avulabic Fortran compter to produce ; iccuucic *ers:cr.s 

for jScVSTbajcc ?Cs. The RE.O.ME file mine cttsmbuaoo describes fio* :.tit£ etecuLiPte programs or. 3t/>er ;;,ti. 
Tutors. 

: ; c + comp 

15 The C^'CCMP prc-^Tjr j a ample x»l to compare da: corufim of r*o codeword files. The uicr is prenpt-c: f cr 

two codeword file urea, 3w reference encoder outpux (filename in Us column of Tabic M/G.T2B) me Jic u» ^cooe- 
oupui. The program compares eact codeword m ineae files add wrists lAe comparison result -.o lermirui ~V rwj-trr.sn 
for xsi cowiguriuae 1 j 'J*-u no Uercru codewords should exist. 

.'J* S.v* 

20 

TV SNR pr^p-am unp^mcnu a signal- to- notse noo neasuremeni berween ;*o signal Hies. The u a :ete 
rer.ee fie proved by tne reference atrorirr profcam. and (he second a ine test decoaer output fje. A jtocjj SNR. CLCS. 
is computed as 3*e uul file sujnai-«>-nois< raoa A sefmenol SNR. SEC 256. is computed as tne average Sigruj-io-nci^c 
raoo of all 136- ample segments with reference signal ponr above a certain thresnold. Minimum segment SN"Rs 1.-= ^ . 

found for segmeau of kmgfc 156. IZS, 6«*. 32, 16, 8 and 4 wsfe power above ttae same tnresnohd. 

25 To run ine SNR procrxm. the user Deeds c enter nacaes of nc input files, The first is die reference .Vrodrr cut 

pvu file as described ua dse Us -oJuma of Tabk I-3/G.728. Tbe second is tne rVrrrlrii output file produced trv \hc vrnv: 
under test. After file*, tne ijrona m oufiputs (tie differeat SNRs 10 icnrunaL Require mem *aiues for jie ;;s; 

conA^iraoons 2 and * are pves ua terms of these SNR austaen. 

30 

The WSNR aigonL*«B ii baaed ofl a reference rtrrrrirr and disunce measure imptemenaaon x» compute *r.e -ntxi 
percepciaily wejgbted diaonoo of a uud cw md spq nmrr A kafantoouc supuJ -<c-distoroon rauo u compuiea for c^erv 
5 -sample ugnai vtcar. and tne raooa are arcrafed over ait sugjai veevrs wub a>agy aoovc a ceraio tnresnotd. 

To run tfie W5NR program. Sse user seeds 10 cater aanacs of r»o isput files. The first is tne encoder mpui s.g-uJ 
file f&ra column of Tabk aad (ha secoad is *• encoder output ct^w^ ftk. Ate prccessing ine s«5uer<c 

WS.S"R wnaes ou^ux V3NR ^aiiaa to crnimL The req oa r emj t v«lue for test csoiifuraooa 3 is pvea in terms 01 r u 
WSNR number. 

12.4 LDCDEC 

40 la «^«tf p n c be daraa akeaajurernent proframx dse daahbuboa also includes a reference nrrrrW demons^uon 

prc-gram. LDCDGC Thai ^ufyma a baaed an the same decoder sacirouaoe as W$NR and could be cnodifbed to mon;Ljr 
vanab^s ui die decoder for ±*~> u » i purpoaea. Tha user a prumv<ftd far input codeword file, die output ugncu 
and wrietner to aactnde adaptive pnefflur or ool 
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' 4 Tc;x ZtTmtfLCtZ 

The foUowvif is 1 dejcnpaon of tne its secucnces a be ippued. The description inc. aces Lie specific -i;.^ 
.-rse-Ls for cjcp. sequence. 



10 ^ sequences art numbered sequenualJv. wuft a prefix Out ider.afics ine type of signal : 

CN: encoder ;nput signal 

C^CV : encoder output codewords 

CW : decoder input codewords 

OUT A: decod e r output ngral wuhc** pcwfiltex 

OUTB: de cod er output agrul wuh poa£lier 

75 

AU tea sequence 5J« have die extension • BIN. 



/ * 2 F>le formats 

The signal files, according, to ihe LD-CH-P interfaces SU and SPf (file prefix IN, OUT A and OUT3* are all tn 
I's complement 16 bu binary format and snould be ini upm ad' io have i fixed binary point between bu *Z irv2 «3. u 
shown in Figure I-5/G.72S. No«e tnat ill (fee 16 available b*u mua be used lo actueve maximum precision -n me tea mea- 
surements. 

The codeword files (LD-CELP sfnaJ ICHAN. 5k prefix CW or CNCW), are sored a ine same 16 bit binary 
formal ts ibe ngnai files. The lean significant 10 bio of eadb 16 bu word represent tne 10 bit codeword, as snown m 
Figure I-5/G.728. The ache* btn (»12-»15) are «ei o rem 

Both signal and codeword Qea are sored in tne tow-byte firs word storage format dot is usual on fB M/DOS and 
v AX/VMS conpuxers. For use oo cxher piacforou, wc± as moat UNIX m a c tu nc*, this ordering may have to be :runged 
ry a byocjwap ooeranon. 
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40 

/ A J Ttst stquimcu md fm^drtrnd^u 

The mbtoi cfl (feai aacaos describe- (fat rnrnpAntr sec c£ esu to be performed id verify ft* an unpiernenunon of 
LD-CHJ* follovi dr yccuVa o o o aad ts 1411 iirili witfe adMr correct impfrrmmao om. Tabic I-LG.72S is a summary 
45 of (fee encoder tests gqttfnce*. Tnt ecrrcapending m^uir e mcp a are uptinml in Tabic 1-2A3.T2S. Table I 3Xj 72S and 
concu tne decoder est sequence jommsry aod nsquoremencL 
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Ail eve filca m iianouoon ire core4 ji r-vc I Vfby^ 3.5" DOS dmrnei >ixcii£ copies :ir. x jr;;rrc 
Ton „*v .TV u fodc*\ng *aaresi: 

TV Gcnerti Sccr^te-Jf 
Saia Scnnce 
*0 p'jce du Siaocs 

ch-;:u Gct«v« 2) 
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:ompik and link Jie program*. Exeensau are used to separue diffarm fde rypa. '.FOR fii« arc source :>^» :cr 
forxan program*, # £XE fiiea sre tiacuabta and '.BIN are binary £fi lequenc* fiks- TV content or e-*:.-, cu;: 

* xs Us in Tabte !•<<!. ?2S. 
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Claims 

1 . A method of generating linear prediction filter coefficient signals during frame erasure, the generated tin- 
ear prediction coefficient signals for use by a linear prediction filter in synthesizing a speech signal, the 

5 method comprising the steps of: 

storing linear prediction coefficient signals in a memory, said linear prediction coefficient signals 
generated responsive to a speech signal corresponding to a non-erased frame; and 

responsive to a frame erasure, scaling one or more of said stored linear prediction coefficient sig- 
nals by a scale factor. BEF raised to an exponent i. where 0.95 = BEF^0.99 and where i indexes the stored 
10 linear prediction coefficient signals, the scaled linear prediction coefficient signals applied to the linear 

prediction filter for use in synthesizing the speech signal. 

2. The method of claim 1 wherein BEF is substantially equal to 0.97. 

75 3. The method of claim 1 wherein BEF is substantially equal to 0.98. 

4. The method of claim 1 wherein the linear prediction filter comprises a 50th order linear prediction filter 
and said exponent indexes 50 linear prediction coefficient signals. 

20 5. The method of claim 1 wherein the linear prediction filter comprises a filter of an order greater than 20 
and said exponent indexes a number of linear prediction coefficient signals, the number equal to the order 
of the filter. 



25 



6. The method of claim 1 wherein the step of scaling is performed once per erased frame. 



7. A method of synthesizing a signal reflecting human speech, the method for use by a decoder which ex- 
periences an erasure of input bits, the decoder including a first excitation signal generator responsive to 
said input bits and a synthesis filter responsive to an excitation signal, the method comprising the steps 
of: 

30 storing samples of a first excitation signal generated by said first excitation signal generator; 

responsive to a signal indicating the erasure of input bits, synthesizing a second excitation signal 
based on previously stored samples of the first excitation signal; and 

filtering said second excitation signal to synthesize said signal reflecting human speech; 
wherein the step of synthesizing a second excitation signal includes the steps of: 
35 correlating a first subset of samples stored in said memory with a second subset of samples stored 

in said memory, at least one of said samples in said second subset being earlier than any sample in said 
first subset; 

identifying a set of stored excitation signal samples based on a correlation of first and second sub- 
sets; 

40 forming said second excitation signal based on said identified set of excitation signal samples. 

8. The method of claim 7 wherein the step of forming said second excitation signal comprises copying said 
identified set of stored excitation signal samples for use as samples of said second excitation signal. 

45 9. The method of claim 7 wherein said identified set of stored excitation signal samples comprises five con- 
secutive stored samples. 

10. The method of claim 7 further comprising the step of storing samples of said second excitation signal in 
said memory. 

50 

11. The method of claim 7 further comprising the step of determining whether erased input bits likely represent 
non- voiced speech. 

12. The method of claim 7 wherein; 

55 the step of correlating comprises determining a time lag value between first and second subsets 

of samples corresponding to a maximum correlation; and 

the step of identifying a set of stored excitation signal samples comprises identifying said samples 
based on said time lag value. 

90 
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13. The method of claim 1 2 further comprising the steps of: 

in accordance with a test, determining whether erased input bits likely represent a signal of very 
low periodicity: and 

if erased input bits are determined to represent a signal of very low periodicity, modifying said time 
5 lag value. 

14. The method of claim 1 3 wherein said test comprises comparing a weight of a single tap pitch predictor to 
a threshold. 

w 15. The method of claim 13 wherein said test comprises comparing the maximum correlation to a threshold. 

16. The method of claim 13 wherein the step of modifying said time lag value comprises incrementing said 
time lag value. 
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FIG. 3 
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FIG. 4 



'NO* BRANCH FROM 
DECISION 1201 



COMPUTE CORRELATION BETWEEN BLOCK Of LAST 30 
SAMPLES OF ETPAST AND EVERY OTHER BLOCK OF 30 
SAMPLES Of ETPAST WHICH LAGS THE FIRST BLOCK 
BY BETWEEN 31 ANO 170 SAMPLES IN PAST 



1230 



FOR ALL VALUES Of CORRELATION GREATER 
THAN THRESHOLD, THC, DETERMINE TIME (LAG) 
Of MAX CORRELATION, MAX) 



1234 




1232 



PTAP < VTH1? 



> 



NO 



1238 



YES 


YES , 






- INCREMENT MAX) 





1236 

-A 

MAX CORRELATION AT \ NO 
MAX! < MAXC? 



1 




COUNT MAX! SAMPLES BACKWARD IN ETPAST; 
SELECT 5 CONSECUTIVE SAMPLES FOR ET 



1240 



UPOATE ETPAST WITH ET 



1242 



YES / NEED MORE SAMPLES 
\T0 FILL ERASED FRAME? 




NO 



YES 



IS NEXT FRAME ERASED? 



NO 



1244 




1246 



94 



EP 0 673 018 A2 



FIG. 5 
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